6,659 Matching Annotations
  1. Apr 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addresses the question of how task-relevant sensory information affects activity in the motor cortex. The authors use various approaches to address this question, looking at single units and population activity. They find that there are three subtypes of modulation by sensory information at the single unit level. Population analyses reveal that sensory information affects the neural activity orthogonally to motor output. The authors then compare both single unit and population activity to computational models to investigate how encoding of sensory information at the single unit level is coordinated in a network. They find that an RNN that displays similar orbital dynamics and sensory modulation to the motor cortex also contains nodes that are modulated similarly to the three subtypes identified by the single unit analysis.

      Strengths:

      The strengths of this study lie in the population analyses and the approach of comparing single-unit encoding to population dynamics. In particular, the analysis in Figure 3 is very elegant and informative about the effect of sensory information on motor cortical activity.

      The task is also well designed to suit the questions being asked and well controlled.

      We appreciate these kind comments.

      It is commendable that the authors compare single units to population modulation. The addition of the RNN model and perturbations strengthen the conclusion that the subtypes of individual units all contribute to the population dynamics. However, the subtypes (PD shift, gain, and addition) are not sufficiently justified. The authors also do not address that single units exhibit mixed modulation, but RNN units are not treated as such.

      We’re sorry that we didn’t provide sufficient grounds to introduce the subtypes. We have updated this in the revised manuscript, in Lines 102-104 as:

      “We determined these modulations on the basis of the classical cosine tuning model (Georgopoulos et al., 1982) and several previous studies (Bremner and Andersen, 2012; Pesaran et al., 2010; Sergio et al., 2005).”

      In our study, we applied the subtype analysis as a criterion to identify the modulation in neuron populations, rather than sorting neurons into exclusively different cell types.

      Weaknesses:

      The main weaknesses of the study lie in the categorization of the single units into PD shift, gain, and addition types. The single units exhibit clear mixed selectivity, as the authors highlight. Therefore, the subsequent analyses looking only at the individual classes in the RNN are a little limited. Another weakness of the paper is that the choice of windows for analyses is not properly justified and the dependence of the results on the time windows chosen for single-unit analyses is not assessed. This is particularly pertinent because tuning curves are known to rotate during movements (Sergio et al. 2005 Journal of Neurophysiology).

      In our study, the mixed selectivity or specifically the target-motion modulation on reach- direction tuning is a significant feature of the single neurons. We categorized the neurons into three subclasses, not intending to claim their absolute cell types, but meaning to distinguish target-motion modulation patterns. To further characterize these three patterns, we also investigated their interaction by perturbing connection weights in RNN.

      Yes, it’s important to consider the role of rotating tuning curves in neural dynamics during interception. In our case, we observed population neural state with sliding windows, and we focused on the period around movement onset (MO) due to the unexpected ring-like structure and the highest decoding accuracy of transferred decoders (Figure S7C). Then, the single-unit analyses were implemented.

      This paper shows sensory information can affect motor cortical activity whilst not affecting motor output. However, it is not the first to do so and fails to cite other papers that have investigated sensory modulation of the motor cortex (Stavinksy et al. 2017 Neuron, Pruszynski et al. 2011 Nature, Omrani et al. 2016 eLife). These studies should be mentioned in the Introduction to capture better the context around the present study. It would also be beneficial to add a discussion of how the results compare to the findings from these other works.

      Thanks for the reminder. We’ve introduced these relevant researches in the updated manuscript in Lines 422-426 as:

      “To further clarify, the discussing target-motion effect is different from the sensory modulation in action selection (Cisek and Kalaska, 2005), motor planning (Pesaran et al., 2006), visual replay and somatosensory feedback (Pruszynski et al., 2011; Stavisky et al., 2017; Suway and Schwartz, 2019; Tkach et al., 2007), because it occurred around movement onset and in predictive control trial-by-trial.”

      This study also uses insights from single-unit analysis to inform mechanistic models of these population dynamics, which is a powerful approach, but is dependent on the validity of the single-cell analysis, which I have expanded on below.

      I have clarified some of the areas that would benefit from further analysis below:

      (1) Task:

      The task is well designed, although it would have benefited from perhaps one more target speed (for each direction). One monkey appears to have experienced one more target speed than the others (seen in Figure 3C). It would have been nice to have this data for all monkeys.

      A great suggestion; however, it is hardly feasible as the Utah arrays have already been removed.

      (2) Single unit analyses:

      In some analyses, the effects of target speed look more driven by target movement direction (e.g. Figures 1D and E). To confirm target speed is the main modulator, it would be good to compare how much more variance is explained by models including speed rather than just direction. More target speeds may have been helpful here too.

      A nice suggestion. The fitting goodness of the simple model (only movement direction) is much worse than the complex models (including target speed). We’ve updated the results in the revised manuscript in Lines 119-122, as “We found that the adjusted R2 of a full model (0.55 ± 0.24, mean ± sd.) can be higher than that of the PD shift (0.47 ± 0.24), gain (0.46 ± 0.22), additive (0.41 ± 0.26), and simple models (only reach direction, 0.34 ± 0.25) for three monkeys (1162 neurons, ranksum test, one-tailed, p<0.01, Figure S5).”

      The choice of the three categories (PD shift, gain addition) is not completely justified in a satisfactory way. It would be nice to see whether these three main categories are confirmed by unsupervised methods.

      A good point. It is a pity that we haven’t found an appropriate unsupervised method.

      The decoder analyses in Figure 2 provide evidence that target speed modulation may change over the trial. Therefore, it is important to see how the window considered for the firing rate in Figure 1 (currently 100ms pre - 100ms post movement onset) affects the results.

      Thanks for the suggestion and close reading. Because the movement onset (MO) is the key time point of this study, we colored this time period in Figure 1 to highlight the perimovement neuronal activity.

      (3) Decoder:

      One feature of the task is that the reach endpoints tile the entire perimeter of the target circle (Figure 1B). However, this feature is not exploited for much of the single-unit analyses. This is most notable in Figure 2, where the use of a SVM limits the decoding to discrete values (the endpoints are divided into 8 categories). Using continuous decoding of hand kinematics would be more appropriate for this task.

      This is a very reasonable suggestion. In the revised manuscript, we’ve updated the continuous decoding results with support vector regression (SVR) in Figure S7A and in Lines 170-173 as:

      “These results were stable on the data of the other two monkeys and the pseudopopulation of all three monkeys (Figure S6) and reconfirmed by the continuous decoding results with support vector regressions (Figure S7A), suggesting that target motion information existed in M1 throughout almost the entire trial.”

      (4) RNN:

      Mixed selectivity is not analysed in the RNN, which would help to compare the model to the real data where mixed selectivity is common. Furthermore, it would be informative to compare the neural data to the RNN activity using canonical correlation or Procrustes analyses. These would help validate the claim of similarity between RNN and neural dynamics, rather than allowing comparisons to be dominated by geometric similarities that may be features of the task. There is also an absence of alternate models to compare the perturbation model results to.

      Thank you for these helpful suggestions. We have performed decoding analysis on RNN units and updated in Figure S12A and Lines 333-334 as: “First, from the decoding result, target motion information existed in nodes’ population dynamics shortly after TO (Figure S12A).”

      We also have included the results of canonical correlation analysis and Procrustes analysis in Table S2 and Lines 340-342 as: “We then performed canonical component analysis (CCA) and Procrustes analysis (Table S2; see Methods), the results also indicated the similarity between network dynamics and neural dynamics.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Zhang et al. examine neural activity in the motor cortex as monkeys make reaches in a novel target interception task. Zhang et al. begin by examining the single neuron tuning properties across different moving target conditions, finding several classes of neurons: those that shift their preferred direction, those that change their modulation gain, and those that shift their baseline firing rates. The authors go on to find an interesting, tilted ring structure of the neural population activity, depending on the target speed, and find that (1) the reach direction has consistent positioning around the ring, and (2) the tilt of the ring is highly predictive of the target movement speed. The authors then model the neural activity with a single neuron representational model and a recurrent neural network model, concluding that this population structure requires a mixture of the three types of single neurons described at the beginning of the manuscript.

      Strengths:

      I find the task the authors present here to be novel and exciting. It slots nicely into an overall trend to break away from a simple reach-to-static-target task to better characterize the breadth of how the motor cortex generates movements. I also appreciate the movement from single neuron characterization to population activity exploration, which generally serves to anchor the results and make them concrete. Further, the orbital ring structure of population activity is fascinating, and the modeling work at the end serves as a useful baseline control to see how it might arise.

      Thank you for your recognition of our work.

      Weaknesses:

      While I find the behavioral task presented here to be excitingly novel, I find the presented analyses and results to be far less interesting than they could be. Key to this, I think, is that the authors are examining this task and related neural activity primarily with a singleneuron representational lens. This would be fine as an initial analysis since the population activity is of course composed of individual neurons, but the field seems to have largely moved towards a more abstract "computation through dynamics" framework that has, in the last several years, provided much more understanding of motor control than the representational framework has. As the manuscript stands now, I'm not entirely sure what interpretation to take away from the representational conclusions the authors made (i.e. the fact that the orbital population geometry arises from a mixture of different tuning types). As such, by the end of the manuscript, I'm not sure I understand any better how the motor cortex or its neural geometry might be contributing to the execution of this novel task.

      This paper shows the sensory modulation on motor tuning in single units and neural population during motor execution period. It’s a pity that the findings were constrained in certain time windows. We are still working on this task, please look forward to our following work.

      Main Comments:

      My main suggestions to the authors revolve around bringing in the computation through a dynamics framework to strengthen their population results. The authors cite the Vyas et al. review paper on the subject, so I believe they are aware of this framework. I have three suggestions for improving or adding to the population results:

      (1) Examination of delay period activity: one of the most interesting aspects of the task was the fact that the monkey had a random-length delay period before he could move to intercept the target. Presumably, the monkey had to prepare to intercept at any time between 400 and 800 ms, which means that there may be some interesting preparatory activity dynamics during this period. For example, after 400ms, does the preparatory activity rotate with the target such that once the go cue happens, the correct interception can be executed? There is some analysis of the delay period population activity in the supplement, but it doesn't quite get at the question of how the interception movement is prepared. This is perhaps the most interesting question that can be asked with this experiment, and it's one that I think may be quite novel for the field--it is a shame that it isn't discussed.

      It’s a great idea! We are on the way, and it seems promising.

      (2) Supervised examination of population structure via potent and null spaces: simply examining the first three principal components revealed an orbital structure, with a seemingly conserved motor output space and a dimension orthogonal to it that relates to the visual input. However, the authors don't push this insight any further. One way to do that would be to find the "potent space" of motor cortical activity by regression to the arm movement and examine how the tilted rings look in that space (this is actually fairly easy to see in the reach direction components of the dPCA plot in the supplement--the rings will be highly aligned in this space). Presumably, then, the null space should contain information about the target movement. dPCA shows that there's not a single dimension that clearly delineates target speed, but the ring tilt is likely evident if the authors look at the highest variance neural dimension orthogonal to the potent space (the "null space")-this is akin to PC3 in the current figures, but it would be nice to see what comes out when you look in the data for it.

      Thank you for this nice suggestion. While it was feasible to identify potent subspaces encoding reach direction and null spaces for target-velocity modulation, as suggested by the reviewer, the challenge remained that unsupervised methods were insufficient to isolate a pure target-velocity subspace from numerous possible candidates due to the small variance of target-velocity information. Although dPCA components can be used to construct orthogonal subspaces for individual task variables, we found that the targetvelocity information remained highly entangled with reach-direction representation. More details can be found in Figure S8C and its caption as below:

      “We used dPCA components with different features to construct three subspaces (same data in A, reach-direction space #3, #4, #5; target-velocity space #10, #15, #17; interaction space #6, #11, #12), and we projected trial-averaged data into these orthogonal subspaces using different colormaps. This approach allowed us to obtain a “potent subspace” coding reach direction and a “null space” for target velocity. The results showed that the reach-direction subspace effectively represented the reach direction. However, while the target-velocity subspace encoded the target velocity information, it still contained reach-direction clusters within each target-velocity condition, corroborating the results of the addition model in the main text (Figure 4). The interaction subspace revealed that multiple reach-direction rings were nested within each other, similar to the findings from the gain model (Figure 3 & 4). The interaction subspace also captured more variance than target-velocity subspace, consistent with our PCA results, suggesting the target-velocity modulation primarily coexists with reach-direction coding. Furthermore, we explored alternative methods to verify whether orthogonal subspaces could effectively separate the reach direction and target velocity. We could easily identify the reach-direction subspace, but its orthogonal subspace was relatively large, and the target-velocity information exhibited only small variance, making it difficult to isolate a subspace that purely encodes target velocity.”

      (3) RNN perturbations: as it's currently written, the RNN modeling has promise, but the perturbations performed don't provide me with much insight. I think this is because the authors are trying to use the RNN to interpret the single neuron tuning, but it's unclear to me what was learned from perturbing the connectivity between what seems to me almost arbitrary groups of neurons (especially considering that 43% of nodes were unclassifiable). It seems to me that a better perturbation might be to move the neural state before the movement onset to see how it changes the output. For example, the authors could move the neural state from one tilted ring to another to see if the virtual hand then reaches a completely different (yet predictable) target. Moreover, if the authors can more clearly characterize the preparatory movement, perhaps perturbations in the delay period would provide even more insight into how the interception might be prepared.

      We are sorry that we did not clarify the definition of “none” type, which can be misleading. The 43% unclassifiable nodes include those inactive ones; when only activate (taskrelated) nodes included, the ratio of unclassifiable nodes would be much lower. We recomputed the ratios with only activated units and have updated Table 1. By perturbing the connectivity, we intended to explore the interaction between different modulations.

      Thank you for the great advice. We considered moving neural states from one ring to another without changing the directional cluster. However, we found that this perturbation design might not be fully developed: since the top two PCs are highly correlated with movement direction, such a move—similar to exchanging two states within the same cluster but under different target-motion conditions—would presumably not affect the behavior.

      Reviewer #3 (Public Review):

      Summary:

      This experimental study investigates the influence of sensory information on neural population activity in M1 during a delayed reaching task. In the experiment, monkeys are trained to perform a delayed interception reach task, in which the goal is to intercept a potentially moving target.

      This paradigm allows the authors to investigate how, given a fixed reach endpoint (which is assumed to correspond to a fixed motor output), the sensory information regarding the target motion is encoded in neural activity.

      At the level of single neurons, the authors found that target motion modulates the activity in three main ways: gain modulation (scaling of the neural activity depending on the target direction), shift (shift of the preferred direction of neurons tuned to reach direction), or addition (offset to the neural activity).

      At the level of the neural population, target motion information was largely encoded along the 3rd PC of the neural activity, leading to a tilt of the manifold along which reach direction was encoded that was proportional to the target speed. The tilt of the neural manifold was found to be largely driven by the variation of activity of the population of gain-modulated neurons.

      Finally, the authors studied the behaviour of an RNN trained to generate the correct hand velocity given the sensory input and reach direction. The RNN units were found to similarly exhibit mixed selectivity to the sensory information, and the geometry of the “ neural population” resembled that observed in the monkeys.

      Strengths:

      - The experiment is well set up to address the question of how sensory information that is directly relevant to the behaviour but does not lead to a direct change in behavioural output modulates motor cortical activity.

      - The finding that sensory information modulates the neural activity in M1 during motor preparation and execution is non trivial, given that this modulation of the activity must occur in the nullspace of the movement.

      - The paper gives a complete picture of the effect of the target motion on neural activity, by including analyses at the single neuron level as well as at the population level. Additionally, the authors link those two levels of representation by highlighting how gain modulation contributes to shaping the population representation.

      Thank you for your recognition.

      Weaknesses:

      - One of the main premises of the paper is the fact that the motor output for a given reach point is preserved across different target motions. However, as the authors briefly mention in the conclusion, they did not record muscle activity during the task, but only hand velocity, making it impossible to directly verify how preserved muscle patterns were across movements. While the authors highlight that they did not see any difference in their results when resampling the data to control for similar hand velocities across conditions, this seems like an important potential caveat of the paper whose implications should be discussed further or highlighted earlier in the paper.

      Thanks for the suggestion. We’ve highlighted the resampling results as an important control in the revised manuscript in Figure S11 and Lines 257-260 as:

      “To eliminate hand-speed effect, we resampled trials to construct a new dataset with similar distributions of hand speed in each target-motion condition and found similar orbital neural geometry. Moreover, the target-motion gain model provided a better explanation compared to the hand-speed gain model (Figure S11).”

      - The main takeaway of the RNN analysis is not fully clear. The authors find that an RNN trained given a sensory input representing a moving target displays modulation to target motion that resembles what is seen in real data. This is interesting, but the authors do not dissect why this representation arises, and how robust it is to various task design choices. For instance, it appears that the network should be able to solve the task using only the motion intention input, which contains the reach endpoint information. If the target motion input is not used for the task, it is not obvious why the RNN units would be modulated by this input (especially as this modulation must lie in the nullspace of the movement hand velocity if the velocity depends only on the reach endpoint). It would thus be important to see alternative models compared to true neural activity, in addition to the model currently included in the paper. Besides, for the model in the paper, it would therefore be interesting to study further how the details of the network setup (eg initial spectral radius of the connectivity, weight regularization, or using only the target position input) affect the modulation by the motion input, as well as the trained population geometry and the relative ratios of modulated cells after training.

      Great suggestions. In the revised manuscript, we’ve added the results of three alternative modes in Table S4 and Lines 355-365 as below:

      “We also tested three alternative network models: (1) only receives motor intention and a GO-signal; (2) only receives target location and a GO-signal; (3) initialized with sparse connection (sparsity=0.1); the unmentioned settings and training strategies were as the same as those for original models (Table S4; see Methods). The results showed that the three modulations could emerge in these models as well, but with obviously distinctive distributions. In (1), the ring-like structure became overlapped rings parallel to the PC1PC2 plane or barrel-like structure instead; in (2), the target-motion related tilting tendency of the neural states remained, but the projection of the neural states on the PC1-PC2 plane was distorted and the reach-direction clusters dispersed. These implies that both motor intention and target location seem to be needed for the proposed ring-like structure. The initialization of connection weights of the hidden layer can influence the network’s performance and neural state structure, even so, the ring-like structure”

      - Additionally, it is unclear what insights are gained from the perturbations to the network connectivity the authors perform, as it is generally expected that modulating the connectivity will degrade task performance and the geometry of the responses. If the authors wish the make claims about the role of the subpopulations, it could be interesting to test whether similar connectivity patterns develop in networks that are not initialized with an all-to-all random connectivity or to use ablation experiments to investigate whether the presence of multiple types of modulations confers any sort of robustness to the network.

      Thank you for these great suggestions. By perturbations, we intended to explore the contribution of interaction between certain subpopulations. We’ve included the ablation experiments in the updated manuscript in Table S3 and Lines 344-346 as below: “The ablation experiments showed that losing any kind of modulation nodes would largely deteriorate the performance, and those nodes merely with PD-shift modulation could mostly impact the neural state structure (Table S3).”

      - The results suggest that the observed changes in motor cortical activity with target velocity result from M1 activity receiving an input that encodes the velocity information. This also appears to be the assumption in the RNN model. However, even though the input shown to the animal during preparation is indeed a continuously moving target, it appears that the only relevant quantity to the actual movement is the final endpoint of the reach. While this would have to be a function of the target velocity, one could imagine that the computation of where the monkeys should reach might be performed upstream of the motor cortex, in which case the actual target velocity would become irrelevant to the final motor output. This makes the results of the paper very interesting, but it would be nice if the authors could discuss further when one might expect to see modulation by sensory information that does not directly affect motor output in M1, and where those inputs may come from. It may also be interesting to discuss how the findings relate to previous work that has found behaviourally irrelevant information is being filtered out from M1 (for instance, Russo et al, Neuron 2020 found that in monkeys performing a cycling task, context can be decoded from SMA but not from M1, and Wang et al, Nature Communications 2019 found that perceptual information could not be decoded from PMd)?

      How and where sensory information modulating M1 are very interesting and open questions. In the revised manuscript, we discuss these in Lines 435-446, as below: “It would be interesting to explore whether other motor areas also allow sensory modulation during flexible interception. The functional differences between M1 and other areas lead to uncertain speculations. Although M1 has pre-movement activity, it is more related to task variables and motor outputs. Recently, a cycling task sets a good example that the supplementary motor area (SMA) encodes context information and the entire movement (Russo et al., 2020), while M1 preferably relates to cycling velocity (Saxena et al., 2022). The dorsal premotor area (PMd) has been reported to capture potential action selection and task probability, while M1 not (Cisek and Kalaska, 2005; Glaser et al., 2018; Wang et al., 2019). If the neural dynamics of other frontal motor areas are revealed, we might be able to tell whether the orbital neural geometry of mixed selectivity is unique in M1, or it is just inherited from upstream areas like PMd. Either outcome would provide us some insights into understanding the interaction between M1 and other frontal motor areas in motor planning.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      At times the writing was a little hard to parse. It could benefit from being fleshed out a bit to link sentences together better.

      There are a few grammatical errors, such as:

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact each other, so the PD-shift nodes should not be neglected."

      should be

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact WITH each other, so the PDshift nodes should not be neglected."

      The discussion could also be more extensive to benefit non-experts in the field.

      Thank you. We have proofread and polished the updated manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Other comments:

      - The authors mention mixed selectivity a few times, but Table 1 doesn't have a column for mixed selective neurons--this seems like an important oversight. Likewise, it would be good to see an example of a "mixed" neuron.

      - The structure of the writing in the results section often talked about the supplementary results before the main results - this seems backwards. If the supplementary results are important enough to come before the main figures, then they should not be supplementary. Otherwise, if the results are truly supplementary, they should come after the main results are discussed.

      - Line 305: Authors say "most" RNN units could be classified, and this is technically true, but only barely, according to Table 1. It might be good to put the actual percentage here in the text.

      - Figure 5a: typo ("Motion intention" rather than "Motor")

      - I couldn't find any mention of code or data availability in the manuscript.

      - There were a number of lines that didn't make much sense to me and should probably be rewritten or expanded on:

      - Lines 167-168: "These results qualitatively imply the interaction as that target speeds..." - Lines 178-179: "However, these neural trajectories were not yet the ideal description, because they were shaped mostly by time."

      - Lines 187-188: "...suggesting that target motion affects M1 neural dynamics via a topologically invariant transformation."

      - Lines 224-226: "Note that here we performed an linear transformation on all resulting neural state points to make the ellipse of the static condition orthogonal to the z-axis for better visualization." Does this mean that the z-axis is not PC 3 anymore?

      - Lines 272-274: "These simulations suggest that the existence of PD-shift and additive modulation would not disrupt the neural geometry that is primarily driven by gain modulation; rather it is possible that these three modulations support each other in a mixed population."

      Thank you for these detailed suggestions. By “mixed selectivity”, we mean the joint tuning of both target-motion and movement. In this case, the target-motion modulated neurons (regardless of the modulation type) are of mixed selectivity. The term “motor intention” refers to Mazzoni et al., 1996, Journal of Neurophysiology. We also revised the manuscript for better readership.

      We have updated the data and code availability in Data availability as below:

      “The example experimental datasets and relevant analysis code have been deposited in Mendeley Data at https://data.mendeley.com/datasets/8gngr6tphf. The RNN relevant code and example model datasets are available at https://github.com/yunchenyc/RNN_ringlike_structure.“

      Reviewer #3 (Recommendations For The Authors):

      Minor typos:

      Line 153: “there were”

      Line 301: “network was trained to generate”

      Line 318: “interact with each other”

      Suggested reformulations :

      Line 310 : “tilting angles followed a pattern similar to that seen in the data” Line 187 : the claim of a “topologically invariant transformation” seems strong as the analysis is quite qualitative.

      Suggested changes to the paper (aside from those mentioned in the main review): It could be nice to show behaviour in a main figure panel early on in the paper. This could help with the task description (as it would directly show how the trials are separated based on endpoint) and could allow for discussing the potential caveats of the assumption that behaviour is preserved.

      Thank you. We have corrected these typos and writing problems. As the similar task design has been reported, we finally decided not to provide extra figures or videos. Still, we thank this nice suggestion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript by Thronlow Lamson et al., the authors develop a "beads-on-a-string" or BOAS strategy to link diverse hemagglutinin head domains, to elicit broadly protective antibody responses. The authors are able to generate varying formulations and lengths of the BOAS and immunization of mice shows induction of antibodies against a broad range of influenza subtypes. However, several major concerns are raised, including the stability of the BOAS, that only 3 mice were used for most immunization experiments, and that important controls and analyses related to how the BOAS alone, and not the inclusion of diverse heads, impacts humoral immunity.

      Strengths:

      Vaccine strategy is new and exciting.

      Analyses were performed to support conclusions and improve paper quality.

      Weaknesses:

      Controls for how different hemagglutinin heads impact immunity versus the multivalency of the BOAS.

      Only 3 mice were used for most experiments.

      There were limited details on size exclusion data.

      We appreciate the reviewer’s comments and have made the following changes to the manuscript.

      (1) We recognize that deconvoluting the effect of including a diverse set of HA heads and multivalency in the BOAS immunogens is necessary to understand the impact on antigenicity. Therefore, we now include a cocktail of the identical eight HA heads used in the 8-mer and BOAS nanoparticle (NP) as an additional control group. While we observed similar HA binding titers relative to the 8-mer and BOAS NP groups, the cocktail group-elicited sera was unable to neutralize any of the viruses tested; multivalency thus appears to be important for eliciting neutralizing responses

      (2) We increased the sample size by repeated immunizations with n=5 mice, for a total of n=8 mice across two independent experiments.

      (3) We expanded the details on size exclusion data to include:

      a) extended chromatograms from Figure 2C as Supplemental Figure 3.

      b) additional details in the materials and methods section (lines 370-372):

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a "beads-on-a-string" (BOAS) immunogen, where they link, using a non-flexible glycine linker, up to eight distinct hemagglutinin (HA) head domains from circulating and non-circulating influenzas and assess their immunogenicity. They also display some of their immunogens on ferritin NP and compare the immunogenicity. They conclude that this new platform can be useful to elicit robust immune responses to multiple influenza subtypes using one immunogen and that it can also be used for other viral proteins.

      Strengths:

      The paper is clearly written. While the use of flexible linkers has been used many times, this particular approach (linking different HA subtypes in the same construct resembling adding beads on a string, as the authors describe their display platform) is novel and could be of interest.

      Weaknesses:

      The authors did not compare to individuals HA ionized as cocktails and did not compare to other mosaic NP published earlier. It is thus difficult to assess how their BOAS compare.<br /> Other weaknesses include the rationale as to why these subtypes were chosen and also an explanation of why there are different sizes of the HA1 construct (apart from expression). Have the authors tried other lengths? Have they expressed all of them as FL HA1?

      We appreciate the reviewer’s comments. We responded to the concerns below and modified the manuscript accordingly.

      (1) We recognize that including a “cocktail” control is important to understand how the multivalency present in a single immunogen affects the immune response. We now include an additional control group comprised of a mixture of the same eight HA heads used in the 8-mer and the BOAS nanoparticle (NP). While this cocktail elicited similar HA binding titers relative to the 8-mer and BOAS NP immunogens (Fig. 6G), there was no detectable neutralization any of the viruses tested (Fig. 7).

      (2) In the introduction we reference other multivalent display platforms but acknowledge that distinct differences in their immunogen design platforms make direct comparisons to ours difficult—which is ultimately why we did not use them as comparators for our in vivo studies. Perhaps most directly relevant to our BOAS platform is the mosaic HA NP from Kanekiyo et al. (PMID 30742080). Here, HA heads, with similar boundaries to ours, were selected from historical H1N1 strains. These NPs however were significantly less antigenic diverse relative to our BOAS NPs as they did not include any group 2 (e.g., H7, H9) or B influenza HAs; restricting their multivalent display to group 1 H1N1s likely was an important factor in how they were able to achieve broad, neutralizing H1N1 responses. Additionally, Cohen et al. (PMID 33661993) used similarly antigenically distinct HAs in their mosaic NP, though these included full-length HAs with the conserved stem region, which likely has a significant impact on the elicited cross-reactive responses observed. Lastly, we reference Hills et al. (PMID 38710880), where authors designed similar NPs with four tandemly-linked betacoronoavirus receptor binding domains (RBDs) to make “quartets”. In contrast to our observations, the authors observed increased binding and neutralization titers following conjugation to protein-based NPs. We acknowledge potential differences between the studies, such as the antigen and larger VLP NP, that could lead to the different observed outcomes.

      (3) We intended to highlight the “plug-and-play” nature of the BOAS platform; theoretically any HA subtype could be interchanged into the BOAS. To that end, our rationale for selecting the HA subtypes in our proof-of-principle immunogen was to include an antigenically diverse set of circulating and non-circulating HAs that we could ultimately characterize with previously published subtype-specific antibodies that were also conformation-specific. In doing so, these diagnostic antibodies could confirm presence and conformation integrity of each component. We intentionally did not include HA subtypes that we did not have a conformation-specific antibody for.

      The different sizes of HA head domains was determined exclusively by expression of the recombinant protein. We have not attempted expression of full-length HA1 domains. Furthermore, we have not attempted to express the full-length HA (inclusive of HA1 and HA2) in our BOAS platform. The primary reason was to avoid including the conserved stem region of HA2 which may distract from the HA1 epitopes (e.g., receptor binding site, lateral patch) that can be engaged by broadly neutralizing antibodies. Additionally, the full-length HA is inherently trimeric and may not be as amenable to our BOAS platform as the monomeric HA1 head domain.

      Reviewer #3 (Public Review):

      This work describes the tandem linkage of influenza hemagglutinin (HA) receptor binding domains of diverse subtypes to create 'beads on a string' (BOAS) immunogens. They show that these immunogens elicit ELISA binding titers against full-length HA trimers in mice, as well as varying degrees of vaccine mismatched responses and neutralization titers. They also compare these to BOAS conjugated on ferritin nanoparticles and find that this did not largely improve immune responses. This work offers a new type of vaccine platform for influenza vaccines, and this could be useful for further studies on the effects of conformation and immunodominance on the resulting immune response.

      Overall, the central claims of immunogenicity in a murine model of the BOAS immunogens described here are supported by the data.

      Strengths included the adaptability of the approach to include several, diverse subtypes of HAs. The determination of the optimal composition of strains in the 5-BOAS that overall yielded the best immune responses was an interesting finding and one that could also be adapted to other vaccine platforms. Lastly, as the authors discuss, the ease of translation to an mRNA vaccine is indeed a strength of this platform.

      One interesting and counter-intuitive result is the high levels of neutralization titers seen in vaccine-mismatched, group 2 H7 in the 5-BOAS group that differs from the 4-BOAS with the addition of a group 1 H5 RBD. At the same time, no H5 neutralization titers were observed for any of the BOAS immunogens, yet they were seen for the BOAS-NP. Uncovering where these immune responses are being directed and why these discrepancies are being observed would constitute informative future work.

      There are a few caveats in the data that should be noted:

      (1) 20 ug is a pretty high dose for a mouse and the majority of the serology presented is after 3 doses at 20 ug. By comparison, 0.5-5 ug is a more typical range (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380945/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980174/). Also, the authors state that 20 ug per immunogen was used, including for the BOAS-NP group, which would mean that the BOAS-NP group was given a lower gram dose of HA RBD relative to the BOAS groups.

      We agree that this is on the “upper end” of recombinant protein dose. While we did not do a dose-response, we now include serum analyses after a single prime. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which reinforces our observation that the multivalency of the HA heads is necessary for eliciting robust serum responses to each component. These data are included in Supplemental Figure 5, and we’ve modified the text (lines 185-187) to include;

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      Additionally, we acknowledge that there is a size discrepancy between the BOAS NP and the largest BOAS, leading to an approximately ~15-fold difference on a per mole basis of the BOAS immunogen. The smallest and largest BOAS also differ by ~ 2.5-fold on a per mole basis; this could favor the overall amount of the smaller immunogens, however because vaccine doses are typically calculated on a mg per kg basis, we did not calculate on a molar basis for this study. Any promising immunogens will be evaluated in dose-response study to optimize elicited responses.

      (2) Serum was pooled from all animals per group for neutralization assays, instead of testing individual animals. This could mean that a single animal with higher immune responses than the rest in the group could dominate the signal and potentially skew the interpretation of this data.

      We repeated the neutralization assays with data points for individual mice. There does appear to be variability in the immune response between mice. This is most noticeable for responses to the H5 component. We are currently assessing what properties of our BOAS immunogen might contribute to the variability across individual mice.

      (3) In Figure S2, it looks like an apparent increase in MW by changing the order of strains here, which may be due to differences in glycosylation. Further analysis would be needed to determine if there are discrepancies in glycosylation amongst the BOAS immunogens and how those differ from native HAs.

      There does appear to be a relatively small difference in MW between the two BOAS configurations shown in Figure S2. This could be due to differences in glycosylation, as the reviewer points out, and in future studies, we intend to assess the influence of native glycosylation on antibody responses elicited by our BOAS immunogens.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Concerns

      (1) From Figure 2D-E, it looks like BOAS are forming clusters, rather than a straight line. Do these form aggregates over time? Both at 4 degrees over a few days or after freeze-thaw cycle(s)? It is unclear from the SEC methods how long after purification this was performed and stability should be considered.

      Due to the inherent flexibility of the Gly-Ser linker between each component we do not anticipate that any rigidity would be imposed resulting in a “straight line”. Nevertheless, we appreciate the reviewers concern about the long-term stability of the BOAS immunogens. To address this, we include 1) the extended chromatograms from Figure 2C as Supplemental Figure 3 to show any aggregates present, 2) traces from up to 48 hours post-IMAC, and 3) chromatograms following a freeze-thaw cycle. Post-IMAC purification there is a minor (<10% total peak height) at ~9mL corresponding to aggregation. Note, we excluded this aggregation for immunizations. Post freeze-thaw cycle, we can see that upon immediate (<24hrs) thawing, the BOAS maintain a homogeneous peak with no significant (<10%) aggregation or degradation peak. However, after ~1 week post-freeze-thaw cycle at 4C, additional peaks within the chromatogram correspond to degradation of the BOAS.

      We modified the materials and methods section to state (lines 370-372)

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      We commented on BOAS stability in the results section (lines 142-148)

      “Following SEC, affinity tags were removed with HRV-3C protease; cleaved tags, uncleaved BOAS, and His-tagged enzyme were removed using cobalt affinity resin and snap frozen in liquid nitrogen before immunizations. BOAS maintained monodispersity upon thawing, though over time, degradation was observed following longer term (>1 week) storage at 4C (Supplemental Figure 3). This degradation became more significant as BOAS increased in length (Supplemental Figure 3).”

      We also included in the discussion (lines 277-279):

      “Notably, for longer BOAS we observed degradation following longer term storage at 4C, which may reflect their overall stability.”

      (2) Figures 3-4 and 6-7, to make conclusions off of 3 mice per group is inappropriate. A sample size calculation should have been conducted and the appropriate number of mice tested. In addition, two independent mouse experiments should always be performed. Moreover, the reliability of the statistical tests performed seems unlikely, given the very small sample size.

      We agree that additional mice are necessary to make assessments regarding immunogenicity and cross-reactivity differences between the immunogens. To address this, we repeated the immunization with 5 additional mice, for a total of n=8 mice over two independent experiments. We incorporated these data into Figure 3B-D, as well as an additional Figure 3E (see below). We also now report the log-transformed endpoint titer (EPT) values rather than reciprocal EC50 values and added clarity to statistical analyses used. We have added the following lines to the methods section

      lines 427-431:

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      lines 406-408:

      “C57BL/6 mice (Jackson Laboratory) (n=8 per group for 3-, 4-, 5-, 6-, 7-, and 8mer cohorts; n=5 for BOAS NP, NP, and mix cohorts) were immunized with 20µg of BOAS immunogens of varying length and adjuvanted with 50% Sigmas Adjuvant for a total of 100µL of inoculum.”

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using Prism (GraphPad Prism v10.2.3). ELISAs comparing serum reactivity and microneutralization and comparing >2 samples were analyzed using a Kruskal-Wallis test with Dunn’s post-hoc test to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. ELISAs comparing two samples were analyzed using a Mann-Whitney test. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) One critical control that is missing is a homogenous BOAS, for example, just linking one H1 on a BOAS. Does oligomerization and increasing avidity alone improve humoral immunity?

      We agree that this is an interesting point, However, to address the impact of oligomerization and avidity on humoral immunity, we now include an additional control with a cocktail of HA heads used in the 8mer. We have incorporated this into Figure 3A, 3D and 3E, Figure 6G, and Figure 7.

      Additionally, we have added the following lines in the manuscript:

      lines 38-40:

      “Finally, vaccination with a mixture of the same HA head domains is not sufficient to elicit the same neutralization profile as the BOAS immunogens or nanoparticles.”

      lines 105-106:

      “Additionally, we showed that a mixture of the same HA head components was not sufficient to recapitulate the neutralizing responses elicited by the BOAS or BOAS NP.”

      lines 169-172:

      “To determine immunogenicity of each BOAS immunogen, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A). We compared these BOAS to a control group immunized with a mixture of the eight HA heads present in the 8mer.”

      lines 265-267:

      “There were qualitatively immunodominant HAs, notably H4 and H9, and these were relatively consistent across BOAS in which they were a component. This effect was reduced in the mix cohort.”

      (4) While some cross-reactivity is likely (Figure 6G), there is considerable loss of binding when there is a mismatch. Of the antibodies induced, how much of this is strain-specific? For example, how well do serum antibodies bind to a pre-2009 H1?

      We agree with the reviewer that there is a considerable loss of binding when there is a mismatched HA component. To better understand this and incorporate a mismatched strain into our analysis of the 8mer and BOAS NP, we looked at serum binding titers to a pre-2009 H1, H1/Solomon Islands/2006, and an antigenically distinct H3, H3/Hong Kong/1968. We have incorporated this data into Figures 3D, 3E, 6F and 6G. We observed relatively high titers against both a mismatched H1 and H3, indicating that the BOAS maintain high titers against subtype-specific strains that are conserved over considerable antigenic distance. However, this was similar in the mixture group, indicating that this may not be specific to oligomerization of BOAS immunogens.

      We added the following to the methods section:

      lines 357-361

      “Head subdomains from these HAs were used in the BOAS immunogens, and full-length soluble ectodomain (FLsE) trimers were used in ELISAs. Additional H1 (H1/A/Solomon Islands/3/2006) and H3 (H3/A/Hong Kong/1/1968) FLsEs were used in ELISAs as mismatched, antigenically distinct HAs for all BOAS.”

      Minor Concerns

      (1) Line 44-46, the deaths per year are almost exclusively due to seasonal influenza outbreaks caused by antigenically drifted viruses in humans, not those spilling over from avian sp. and swine. For accuracy, please adjust this sentence.

      We have adjusted lines 45-48 to say “This is largely a consequence of viral evolution and antigenic drift as it circulates seasonally within humans and ultimately impacts vaccine effectiveness. Additionally, the chance for spillover events from animal reservoirs (e.g., avian, swine) is increasing as population and connectivity also increase.”

      (2) Figure 4D-E, provide a legend for what the symbols indicate, or simply just put the symbol next to either the homology score and % serum competition labels on the y-axis.

      We have included a legend in Figures 4D,E to distinguish between homology score and % serum competition

      (3) I am a bit confused by the data presented in Figure 7. The figure legend says the two symbols represent technical replicates. How? Is one technical replicate of all the mice in a group averaged and that's what's graphed? If so, this is not standard practice. I would encourage the authors to show the average technical replicates of each animal, which is standard.

      We thank the reviewer for their suggestion, and we have revised Figure 7 such that each symbol represents a single animal for n=5 animals. We have also adjusted the figure caption to the following:

      “Figure 7: Microneutralization titers to matched and mis-matched virus- Microneutralization of matched and mis-matched psuedoviruses: H1N1 (green, top left), H3N2 (orange, top right), H5N1 (yellow, bottom left), and H7N9 viruses (pink, bottom right) with d42 serum. Solid bars below each plot indicate a matched sub-type, and striped bars indicate a mis-matched subtype (i.e. not present in the BOAS). NP negative controls were used to determine threshold for neutralization. Upper and lower dashed lines represent the first dilution (1:32) (for H1N1, H3N2, and H5N1) or neutralization average with negative control NP serum (H7N9), and the last serum dilution (1:32,768), respectively, and points at the dashed lines indicate IC50s at or outside the limit of detection. Individual points indicate IC50 values from individual mice from each cohort (n=5). The mean is denoted by a bar and error bars are +/- 1 s.d., * = p<0.05 as determined by a Kruskal-Wallis test with Dunn’s multiple comparison post hoc test relative to the mix group.”

      (4) Paragraphs 298-313, multiple studies are referred to but not referenced.

      We have added the following references to this section:

      (38) Kanekiyo, M. et al. Self-assembling influenza nanoparticle vaccines elicit broadly neutralizing H1N1 antibodies. Nature 498, 102–106 (2013).

      (48) Hills, R. A. et al. Proactive vaccination using multiviral Quartet Nanocages to elicit broad anti-coronavirus responses. Nat. Nanotechnol. 1–8 (2024) doi:10.1038/s41565-024-01655-9.

      (65) Jardine, J. et al. Rational HIV immunogen design to target specific germline B cell receptors. Science 340, 711–716 (2013).

      (66) Tokatlian, T. et al. Innate immune recognition of glycans targets HIV nanoparticle immunogens to germinal centers. Science 363, 649–654 (2019).

      (67) Kato, Y. et al. Multifaceted Effects of Antigen Valency on B Cell Response Composition and Differentiation In Vivo. Immunity 53, 548-563.e8 (2020).

      (68) Marcandalli, J. et al. Induction of Potent Neutralizing Antibody Responses by a Designed Protein Nanoparticle Vaccine for Respiratory Syncytial Virus. Cell 176, 1420-1431.e17 (2019).

      (69) Bruun, T. U. J., Andersson, A.-M. C., Draper, S. J. & Howarth, M. Engineering a Rugged Nanoscaffold To Enhance Plug-and-Display Vaccination. ACS Nano 12, 8855–8866 (2018).

      (70) Kraft, J. C. et al. Antigen- and scaffold-specific antibody responses to protein nanoparticle immunogens. Cell Reports Medicine 100780 (2022) doi:10.1016/j.xcrm.2022.100780.

      Reviewer #2 (Recommendations For The Authors):

      Can the authors define "detectable titers"?

      Maybe add a threshold value of reciprocal EC on the figure for each plot.

      We recognize the reviewers concern with reporting serum titers in this way, and we have adjusted our reported titers as endpoint titers (EPT) with a dotted line for the first detectable dilution (1:50). We have also adjusted the methods section to reflect this change:

      (lines 427-431)

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      It also appears that not all X-mer elicits an immune response against matched HA, e.g. for the 7 and 8 -mer. Not sure why the authors do not mention this. It could be due to too many HAs, not sure.

      We apologize for the confusion, and agree that our original method of reporting EC50 values does not reflect weak but present binding titers. Upon further analysis with additional mice as well as adjusting our method of reporting titers, it is easier to see in Figure 3D that all X-mer BOAS do indeed elicit binding detectable titers to matched HA components.

      It will be nice to add a conclusion to the cross-reactivity - again it appears that past 6-mer there has been a loss in cross-reactivity even though there are more subtypes on the BOAS.

      Also, the TI seemed to be the more conserved epitope targeted here.

      (Of note these two are mentioned in the discussion)

      We have updated the results section to include the following:

      (lines 281-294)

      “Based on the immunogenicity of the various BOAS and their ability to elicit neutralizing responses, it may not be necessary to maximize the number of HA heads into a single immunogen. Indeed, it qualitatively appears that the intermediate 4-, 5-, and 6mer BOAS were the most immunogenic and this length may be sufficient to effectively engage and crosslink BCR for potent stimulation. These BOAS also had similar or improved binding cross-reactivity to mis-matched HAs as compared to longer 7- or 8mer BOAS. Notably, the 3mer BOAS elicited detectable cross-reactive binding titers to H4 and H5 mismatched HAs in all mice. This observed cross-reactivity could be due to sequence conservation between the HAs, as H3 and H4 share ~51% sequence identity, and H1 and H2 share ~46% and ~62% overall sequence identity with H5, respectively (Supplemental Figure 6). Additionally, the degree of surface conservation decreased considerably beyond the 5mer as more antigenically distinct HAs were added to the BOAS. These data suggest that both antigenic distance between HA components and BOAS length play a key role in eliciting cross-reactive antibody responses, and further studies are necessary to optimize BOAS valency and antigenic distance for a desired response.”

      Figure 5E, the authors could indicate which subtype each mab is specific to for those who are not HA experts. (They have them color-coded but it is hard to see because very small).

      The authors also do not explain why 3E5 does not bind well to H1, H2, H3, H4 4-mer BOA, etc...

      We apologize for the lack of clarity in this figure. We updated Figure 5E to include the subtype it is specific for as well as listing the antibodies and their subtype and targeted epitope in the figure caption.

      Minor

      Figure 1B zoom looks like the line is hidden to the structure - should come in front

      We adjusted the figure accordingly.

      Line 127 - whether the order

      Corrected

      What is the rationale for thinking that a different order will lead to a different expression and antigenic results?

      We thank the reviewer for this question. We did not necessarily anticipate a difference in protein expression based on BOAS order We, however, wanted to verify that our platform was indeed “plug-and-play” platform and we could readily exchange components and order. We do, however, hypothesize that a different order may in fact lead to different antigenic results. We think that the conformation of the BOAS as well as physical and antigenic distance of HA components may influence cross-linking efficiency of BCRs and lead to different antigenic results with different levels of cross-reactivity. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in proximity to each other or could be potentially shielded in certain conformations, and thus could affect antigenic results. We expand on this rationale in the discussion in lines 310-314:

      “Further studies with different combinations of HAs could aid in understanding how length and composition influences epitope focusing. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in close proximity to one other or could be potentially shielded in certain conformations, and thus could affect antigenic results.”

      Maybe list HA#1 HA#2 HA#3 instead of HA1, HA2, HA3 to make sure it is not confounded with HA2 and HA2

      We agree that this may be confusing for readers, and have adjusted Figure 1C to show HA#1, HA#2, etc.

      For nsEM, do the authors have 2D classes and even 3D reconstructions? Line 148-149: maybe or just because there are more HAs.

      We did not obtain 2D class or 3D reconstructions of these BOAS. However, we do agree with the reviewer that the collapsed/rosette structure of the 8mer BOAS may be a consequence of the additional HA heads as well as the flexible Gly-Ser linkers between the components. We have added clarify to our statement in the discussion to read:

      lines 154-156:

      “This is likely a consequence of the flexible GSS linker separating the individual HA head components as well as the addition of significantly more HA head components to the construct.”.

      Line 153 " interface-directed" - what does this mean?

      We apologize for any confusion- we intend for “interface-directed” to refer antibodies that engage the trimer interface (TI) epitope between HA protomers. We have adjusted the manuscript to use the same terminology throughout, i.e. trimer interface or its abbreviation, TI.

      For Figure 2 F - do you have a negative control? Usually one does not determine an ELISA KD, it is not very accurate but shows binding in terms of OD value.

      We did include a negative control, MEDI8852, a stem-directed antibody, though it was not shown in the figure because we observed no binding, as expected. This negative control antibody was also used in Figure 5E for characterizing the BOAS NPs, and also shows no binding. We recognize that in an ELISA the KD is an equilibrium measurement and we do not report kinetic measurements as determined by a method such as bio-layer interferometry (BLI), and have this adjusted the figure caption to denote the values as “apparent K<sub>D</sub> values”.

      Line 169 - reads strangely, "BOAS-elicited serum, regardless of its length, reacted<br /> The length is the one of the Immunogen, not the serum

      We agree that this statement is unclear, and we have modified the sentence to read:

      lines 177-178:

      “Each of the BOAS, regardless of its length, elicited binding titers to all matched full-length HAs representing individual components (Figure 3D).”

      What is the adjuvant used (add in results)?

      We used Sigma adjuvant for all immunizations, and have included this information in the results section:

      lines 169-171:

      “To determine immunogenicity of each BOAS, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A).”

      This information is also included in the methods section in lines 406-412.

      Line 178 - remove " across"

      We have removed the word “across” in this sentence and replaced it with “on” (line 194)

      Trimer- interface, and interface epitopes are used exchangeably - maybe keep it as trimer interface to be more precise

      As stated above, we have adjusted the manuscript to use the same term throughout, i.e., trimer interface or its abbreviation, TI.

      Line 221 - no figure 6H (6G?)

      We apologize for this typo and have corrected to Figure 6G (line 231)

      Reviewer #3 (Recommendations For The Authors):

      (1) Since 20 ug x3 doses is quite a high amount of vaccine, differences between immunogens may become blurred. Thus, it may be informative to compare post-prime serology for all immunogens or select immunogens to compare to the post-3rd dose data.

      We agree with the reviewer that this is on the upper end of vaccine dose and thus we explored the serum responses after a single boost. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which bolsters our claim that the presentation of the HA heads is important for eliciting strong serum responses to all components. We have included this data in Supplemental Figure 5, and have acknowledged this in the text:

      lines 185-187:

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      (2) Significance statistics for all immunogenicity data should be added and discussed; it is particularly absent in Figures 3D and 7.

      We have added statistical analyses to Figure 3 and Figure 7 to reflect changes in immunogenicity. We have also added the following to the methods section:

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using either a Mann-Whitney test or a Kruskal-Wallis test with Dunn’s post-hoc test in Prism (GraphPad Prism v10.2.3) to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) Figure 2F: the figure has K03.12 listed for the H3-specific mAb and in the main text, but the caption says 3E5 - is the 3E5 in the caption a typo? 3E5 is listed for the competition ELISAs as an RBS mAb, but its binding site is distal to the RBS at residues 165-170 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787348/), H7.167 binds in the RBS periphery and not directly within the RBS, and the epitope for P2-D9 is undetermined/not presented. This could mean that there is actually a higher proportion of RBS-directed antibodies than what is determined from this serum competition data. Also, reference to these as 'RBS-directed' in the serum competition methods section should be revised for accuracy.

      We sincerely apologize for this error and the resulting confusion. 3E5 in the caption is incorrect and should be K03.12 (https://www.rcsb.org/structure/5W08) and does engage the receptor binding site. We also apologize for the oversight that H7.167 is in the RBS periphery and not directly in the RBS. The additional P2-D9 in the panel of RBS-directed antibodies was also in error, as we do not believe it is RBS-directed, but is indeed H4 specific. We also included a reference to the paper and immunogen that elicited this antibody. We agree that this indicates that there could be a higher proportion of RBS-directed antibodies in the serum and have modified the text in the results and methods sections to read:

      lines 300-306:

      “Notably, this proportion is approximate, as at the time of reporting, antibodies that bind the receptor binding site of all components were not available. RBS-directed antibodies to the H4 and H9 component were not available, and the RBS-directed antibodies used targeting the other HA components have different footprints around the periphery of the RBS. Additionally, there are currently no reported influenza B TI-directed antibodies in the literature. Therefore, this may be an underestimate of the serum proportion focused to the conserved RBS and TI epitopes.”

      lines 435-439:

      “Following blocking with BSA in PBS-T, blocking solution was discarded and 40µL of either DPBS (no competition control), a cocktail of humanized antibodies targeting the RBS and periphery (5J8, 2G1, K03.12, H5.3, H7.167, H1209), a cocktail of humanized TI-directed antibodies (S5V2-29, D1 H1-17/H3-14, D2 H1-1/H3-1), or a negative control antibody (MEDI8852) were added at a concentration of 100µg/mL per antibody.”

      (4) Only nsEM data is shown for the 3-BOAS and 8-BOAS, where differences in morphology were seen between these longer and shorter proteins. Including nsEM images for all BOAS immunogens may show trends in morphology or organization that could correlate with immune responses, e.g. if the 5-BOAS also forms a higher proportion of rosette-like structures, while the the 4-BOAS is still a mix between extended and rosette-like, this could be a factor in the better immune responses seen for 5-BOAS.

      We appreciate the reviewer’s suggestion for further analysis of morphology between the intermediate BOAS sizes. We agree that the relationship between BOAS length and morphology should be explored more in depth, and we intend to do so in future studies and to also vary linker length and rigidity.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This was a clearly written manuscript that did an excellent job summarizing complex data.

      In this manuscript, Cuevas-Zuviría et al. use protein modeling to generate over 5,000 predicted structures of nitrogenase components, encompassing both extant and ancestral forms across different clades. The study highlights that key insertions define the various Nif groups. The authors also examined the structures of three ancestral nitrogenase variants that had been previously identified and experimentally tested. These ancestral forms were shown in earlier studies to exhibit reduced activity in Azotobacter vinelandii, a model diazotroph. This work provides a useful resource for studying nitrogenase evolution.

      However, its impact is somewhat limited due to a lack of evidence linking the observed structural differences to functional changes. For example, in the ancestral nitrogenase structures, only a small set of residues (lines 421-431) were identified as potentially affecting interactions between nitrogenase components. Why didn't the authors test whether reverting these residues to their extant counterparts could improve nitrogenase activity of the ancestral variants?

      We thank the reviewer for their thoughtful comments. We acknowledge that our current study is primarily focused on a computational exploration of the structural differences in both extant and ancestral nitrogenase variants, which allowed us to generate a comprehensive structural dataset. Although we did not carry out experimental reversion tests in this study, we agree that directly assessing the functional consequences of reverting the specific residues (lines 420 to 429) to their extant counterparts is an important next step to elucidate their functional role. Indeed, these findings provide a valuable foundation for our future work, which is designed to include experimental characterization of these variants and further elucidate the role of critical residues in nitrogenase activity and evolution. We believe that these experiments will offer the direct functional validation that the reviewer has rightly pointed out, and we look forward to reporting on these results in a future study.

      Additionally, the paper feels somewhat disconnected. The predicted nitrogenase structures discussed in the first half of the manuscript were not well integrated with the findings from the ancestral structures. For instance, do the ancestral nitrogenase structures align with the predicted models? This comparison was never explicitly made and could have strengthened the study's conclusions.

      We thank the reviewer for this suggestion. Our original analysis (previously shown in Figure S9, now Figure S10) included insights into structural align comparisons. In response, we have reorganized the results section (lines 351-355) to explicitly address this comparison.

      Reviewer #2 (Public review):

      This work aims to study the evolution of nitrogenases, understanding how their structure and function adapted to changes in the environment, including oxygen levels and changes in metal availability. The study predicts > 5000 structures of nitrogenases, corresponding to extant, ancestral, and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive undertaking that is certain to be a resource for the community. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes.

      The challenge with this study is that all (or nearly all) of the quantitative analyses presented are based on RMSD calculations, many of which are under 2 angstroms. For all intents and purposes, two structures with RMSD < 2 angstroms could be considered 'structurally identical'. A lot of insight generated is based on minuscule differences in RMSD, for which it is not clear that they are significantly different. The suggestion would be to find a way to evaluate the RMSD metric and determine whether these values, as obtained for structures being compared, are reliable. Some options are provided in earlier studies: PMID: 11514933, PMID: 17218333, PMID: 11420449, PMID: 8289285 (and others). It could also be valuable to focus more on site-specific RMSDs rather than Global RMSDs. The high conservation in the nitrogenases likely ensures that the global RMSDs will remain low across the family. Focusing on specific regions might reveal interesting differences between clades that are more informative regarding the evolution of structure in tandem with environment/time.

      We thank the reviewer for their suggestions. We agree that while global RMSD values below 2Å typically indicate high structural similarity, relying solely on these measures can mask subtle yet potentially functionally meaningful differences. Our aim was not to test for overall structural identity but rather to quantify fine-scale variations between highly conserved nitrogenase structures, including extant and ancestral variants. Nevertheless, in light of the reviewer’s suggestions, we have implemented an additional metric ( rmsd<sub>100</sub>) for a more nuanced comparison. The results of our additional analyses (Figure S3) align closely with our original results (Figure 2), supporting our decision to retain the un-normalized results in the main text. As an additional measure, we also computed site-specific RMSDs for the active site’s environments (Figure S6) to further delineate subtle structural variations.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      Examination of (a)periodic brain activity has gained particular interest in the last few years in the neuroscience fields relating to cognition, disorders, and brain states. Using large EEG/MEG datasets from younger and older adults, the current study provides compelling evidence that age-related differences in aperiodic EEG/MEG signals can be driven by cardiac rather than brain activity. Their findings have important implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac signals is essential.

      We want to thank the editors for their assessment of our work and highlighting its importance for the understanding of aperiodic neural activity. Additionally, we want to thank the three present and four former reviewers (at a different journal) whose comments and ideas were critical in shaping this manuscript to its current form. We hope that this paper opens up many more questions that will guide us - as a field - to an improved understanding of how “cortical” and “cardiac” changes in aperiodic activity are linked and want to invite readers to engage with our work through eLife’s comment function.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The present study addresses whether physiological signals influence aperiodic brain activity with a focus on age-related changes. The authors report age effects on aperiodic cardiac activity derived from ECG in low and high-frequency ranges in roughly 2300 participants from four different sites. Slopes of the ECGs were associated with common heart variability measures, which, according to the authors, shows that ECG, even at higher frequencies, conveys meaningful information. Using temporal response functions on concurrent ECG and M/EEG time series, the authors demonstrate that cardiac activity is instantaneously reflected in neural recordings, even after applying ICA analysis to remove cardiac activity. This was more strongly the case for EEG than MEG data. Finally, spectral parameterization was done in large-scale resting-state MEG and ECG data in individuals between 18 and 88 years, and age effects were tested. A steepening of spectral slopes with age was observed particularly for ECG and, to a lesser extent, in cleaned MEG data in most frequency ranges and sensors investigated. The authors conclude that commonly observed age effects on neural aperiodic activity can mainly be explained by cardiac activity.

      Strengths:

      Compared to previous investigations, the authors demonstrate the effects of aging on the spectral slope in the currently largest MEG dataset with equal age distribution available. Their efforts of replicating observed effects in another large MEG dataset and considering potential confounding by ocular activity, head movements, or preprocessing methods are commendable and valuable to the community. This study also employs a wide range of fitting ranges and two commonly used algorithms for spectral parameterization of neural and cardiac activity, hence providing a comprehensive overview of the impact of methodological choices. Based on their findings, the authors give recommendations for the separation of physiological and neural sources of aperiodic activity.

      Weaknesses:

      While the aim of the study is well-motivated and analyses rigorously conducted, the overall structure of the manuscript, as it stands now, is partially misleading. Some of the described results are not well-embedded and lack discussion.

      We want to thank the reviewer for their comments focussed on improving the overall structure of the manuscript. We agree with their suggestions that some results could be more clearly contextualized and restructured the manuscript accordingly.

      Reviewer #2 (Public review):

      I previously reviewed this important and timely manuscript at a previous journal where, after two rounds of review, I recommended publication. Because eLife practices an open reviewing format, I will recapitulate some of my previous comments here, for the scientific record.

      In that previous review, I revealed my identity to help reassure the authors that I was doing my best to remain unbiased because I work in this area and some of the authors' results directly impact my prior research. I was genuinely excited to see the earlier preprint version of this paper when it first appeared. I get a lot of joy out of trying to - collectively, as a field - really understand the nature of our data, and I continue to commend the authors here for pushing at the sources of aperiodic activity!

      In their manuscript, Schmidt and colleagues provide a very compelling, convincing, thorough, and measured set of analyses. Previously I recommended that the push even further, and they added the current Figure 5 analysis of event-related changes in the ECG during working memory. In my opinion this result practically warrants a separate paper its own!

      The literature analysis is very clever, and expanded upon from any other prior version I've seen.

      In my previous review, the broadest, most high-level comment I wanted to make was that authors are correct. We (in my lab) have tried to be measured in our approach to talking about aperiodic analyses - including adopting measuring ECG when possible now - because there are so many sources of aperiodic activity: neural, ECG, respiration, skin conductance, muscle activity, electrode impedances, room noise, electronics noise, etc. The authors discuss this all very clearly, and I commend them on that. We, as a field, should move more toward a model where we can account for all of those sources of noise together. (This was less of an action item, and more of an inclusion of a comment for the record.)

      I also very much appreciate the authors' excellent commentary regarding the physiological effects that pharmacological challenges such as propofol and ketamine also have on non-neural (autonomic) functions such as ECG. Previously I also asked them to discuss the possibility that, while their manuscript focuses on aperiodic activity, it is possible that the wealth of literature regarding age-related changes in "oscillatory" activity might be driven partly by age-related changes in neural (or non-neural, ECG-related) changes in aperiodic activity. They have included a nice discussion on this, and I'm excited about the possibilities for cognitive neuroscience as we move more in this direction.

      Finally, I previously asked for recommendations on how to proceed. The authors convinced me that we should care about how the ECG might impact our field potential measures, but how do I, as a relative novice, proceed. They now include three strong recommendations at the end of their manuscript that I find to be very helpful.

      As was obvious from previous review, I consider this to be an important and impactful cautionary report, that is incredibly well supported by multiple thorough analyses. The authors have done an excellent job responding to all my previous comments and concerns and, in my estimation, those of the previous reviewers as well.

      We want to thank the reviewer for agreeing to review our manuscript again and for recapitulating on their previous comments and the progress the manuscript has made over the course of the last ~2 years. The reviewer's comments have been essential in shaping the manuscript into its current form. Their feedback has made the review process truly feel like a collaborative effort, focused on strengthening the manuscript and refining its conclusions and resulting recommendations.

      Reviewer #3 (Public review):

      Summary:

      Schmidt et al., aimed to provide an extremely comprehensive demonstration of the influence cardiac electromagnetic fields have on the relationship between age and the aperiodic slope measured from electroencephalographic (EEG) and magnetoencephalographic (MEG) data.

      Strengths:

      Schmidt et al., used a multiverse approach to show that the cardiac influence on this relationship is considerable, by testing a wide range of different analysis parameters (including extensive testing of different frequency ranges assessed to determine the aperiodic fit), algorithms (including different artifact reduction approaches and different aperiodic fitting algorithms), and multiple large datasets to provide conclusions that are robust to the vast majority of potential experimental variations.

      The study showed that across these different analytical variations, the cardiac contribution to aperiodic activity measured using EEG and MEG is considerable, and likely influences the relationship between aperiodic activity and age to a greater extent than the influence of neural activity.

      Their findings have significant implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac fields is essential.

      We want to thank the reviewer for their thorough engagement with our work and the resultant substantive amount of great ideas both mentioned in the section of Weaknesses and Authors Recommendations below. Their suggestions have sparked many ideas in us on how to move forward in better separating peripheral- from neuro-physiological signals that are likely to greatly influence our future attempts to better extract both cardiac and muscle activity from M/EEG recordings. So we want to thank them for their input, time and effort!

      Weaknesses:

      Figure 4I: The regressions explained here seem to contain a very large number of potential predictors. Based on the way it is currently written, I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions?

      I'm not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including these latent contributions to the full signal back into the same regression model. It seems that there could be some circularity or redundancy in doing so. Can the authors provide a justification for why this is a valid approach?

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      I'm not sure whether there is good evidence or rationale to support the statement in the discussion that the presence of the ECG signal in reference electrodes makes it more difficult to isolate independent ECG components. The ICA algorithm will still function to detect common voltage shifts from the ECG as statistically independent from other voltage shifts, even if they're spread across all electrodes due to the referencing montage. I would suggest there are other reasons why the ICA might lead to imperfect separation of the ECG component (assumption of the same number of source components as sensors, non-Gaussian assumption, assumption of independence of source activities).

      The inclusion of only 32 channels in the EEG data might also have reduced the performance of ICA, increasing the chances of imperfect component separation and the mixing of cardiac artifacts into the neural components, whereas the higher number of sensors in the MEG data would enable better component separation. This could explain the difference between EEG and MEG in the ability to clean the ECG artifact (and perhaps higher-density EEG recordings would not show the same issue).

      The reviewer is making a good argument suggesting that our initial assumption that the presence of cardiac activity on the reference electrode influences the performance of the ICA may be wrong. After rereading and rethinking upon the matter we think that the reviewer is correct and that their assumptions for why the ECG signal was not so easily separable from our EEG recordings are more plausible and better grounded in the literature than our initial suggestion. We therefore now highlight their view as a main reason for why the ECG rejection was more challenging in EEG data. However, we also note that understanding the exact reason probably ends up being an empirical question that demands further research stating that:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      In addition to the inability to effectively clean the ECG artifact from EEG data, ICA and other component subtraction methods have also all been shown to distort neural activity in periods that aren't affected by the artifact due to the ubiquitous issue of imperfect component separation (https://doi.org/10.1101/2024.06.06.597688). As such, component subtraction-based (as well as regression-based) removal of the cardiac artifact might also distort the neural contributions to the aperiodic signal, so even methods to adequately address the cardiac artifact might not solve the problem explained in the study. This poses an additional potential confound to the "M/EEG without ECG" conditions.

      The reviewer is correct in stating that, if an “artifactual” signal is not always present but appears and disappears (like e.g. eye-blinks) neural activity may be distorted in periods where the “artifactual” signal is absent. However, while this plausibly presents a problem for ocular activity, there is no obvious reason to believe that this applies to cardiac activity. While the ECG signal is non-stationary in nature, it is remarkably more stable than eye-movements in the healthy populations we analyzed (especially at rest). Therefore, the presence of the cardiac “artifact” was consistently present across the entirety of the MEG recordings we visually inspected.

      Literature Analysis, Page 23: was there a method applied to address studies that report reducing artifacts in general, but are not specific to a single type of artifact? For example, there are automated methods for cleaning EEG data that use ICLabel (a machine learning algorithm) to delete "artifact" components. Within these studies, the cardiac artifact will not be mentioned specifically, but is included under "artifacts".

      The literature analysis was largely performed automatically and solely focussed on ECG related activity as described in the methods section under Literature Analysis, if no ECG related terms were used in the context of artifact rejection a study was flagged as not having removed cardiac activity. This could have been indeed better highlighted by us and we apologize for the oversight on our behalf. We now additionally link to these details stating that:

      “However, an analysis of openly accessible M/EEG articles (N<sub>Articles</sub>=279; see Methods - Literature Analysis for further details) that investigate aperiodic activity revealed that only 17.1% of EEG studies explicitly mention that cardiac activity was removed and only 16.5% measure ECG (45.9% of MEG studies removed cardiac activity and 31.1% of MEG studies mention that ECG was measured; see Figure 1EF).”

      The reviewer makes a fair point that there is some uncertainty here and our results probably present a lower bound of ECG handling in M/EEG research as, when I manually rechecked the studies that were not initially flagged in studies it was often solely mentioned that “artifacts” were rejected. However, this information seemed too ambiguous to assume that cardiac activity was in fact accounted for. However, again this could have been mentioned more clearly in writing and we apologize for this oversight. Now this is included as part of the methods section Literature Analysis stating that:

      “All valid word contexts were then manually inspected by scanning the respective word context to ensure that the removal of “artifacts” was related specifically to cardiac and not e.g. ocular activity or the rejection of artifacts in general (without specifying which “artifactual” source was rejected in which case the manuscript was marked as invalid). This means that the results of our literature analysis likely present a lower bound for the rejection of cardiac activity in the M/EEG literature investigating aperiodic activity.”

      Statistical inferences, page 23: as far as I can tell, no methods to control for multiple comparisons were implemented. Many of the statistical comparisons were not independent (or even overlapped with similar analyses in the full analysis space to a large extent), so I wouldn't expect strong multiple comparison controls. But addressing this point to some extent would be useful (or clarifying how it has already been addressed if I've missed something).

      In the present study we tried to minimize the risk of type 1 errors by several means, such as A) weakly informative priors, B) robust regression models and C) by specifying a region of practical equivalence (ROPE, see Methods Statistical Inference for further Information) to define meaningful effects.

      Weakly informative priors can lower the risk of type 1 errors arising from multiple testing by shrinking parameter estimates towards zero (see e.g. Lemoine, 2019). Robust regression models use a Student T distribution to describe the distribution of the data. This distribution features heavier tails, meaning it allocates more probability to extreme values, which in turn minimizes the influence of outliers. The ROPE criterion ensures that only effects exceeding a negligible size are considered meaningful, representing a strict and conservative approach to interpreting our findings (see Kruschke 2018, Cohen, 1988).

      Furthermore, and more generally we do not selectively report “significant” effects in the situations in which multiple analyses were conducted on the same family of data (e.g. Figure 2 & 4). Instead we provide joint inference across several plausible analysis options (akin to a specification curve analysis, Simonsohn, Simmons & Nelson 2020) to provide other researchers with an overview of how different analysis choices impact the association between cardiac and neural aperiodic activity.

      Lemoine, N. P. (2019). Moving beyond noninformative priors: why and how to choose weakly informative priors in Bayesian analyses. Oikos, 128(7), 912-928.

      Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4(11), 1208-1214.

      Methods:

      Applying ICA components from 1Hz high pass filtered data back to the 0.1Hz filtered data leads to worse artifact cleaning performance, as the contribution of the artifact in the 0.1Hz to 1Hz frequency band is not addressed (see Bailey, N. W., Hill, A. T., Biabani, M., Murphy, O. W., Rogasch, N. C., McQueen, B., ... & Fitzgerald, P. B. (2023). RELAX part 2: A fully automated EEG data cleaning algorithm that is applicable to Event-Related-Potentials. Clinical Neurophysiology, result reported in the supplementary materials). This might explain some of the lower frequency slope results (which include a lower frequency limit <1Hz) in the EEG data - the EEG cleaning method is just not addressing the cardiac artifact in that frequency range (although it certainly wouldn't explain all of the results).

      We want to thank the reviewer for suggesting this interesting paper, showing that lower high-pass filters may be preferable to the more commonly used >1Hz high-pass filters for detection of ICA components that largely contain peripheral physiological activity. However, the results presented by Bailey et al. contradict the more commonly reported findings by other researchers that >1Hz high-pass filter is actually preferable (e.g. Winkler et al. 2015; Dimingen, 2020 or Klug & Gramann, 2021) and recommendations in widely used packages for M/EEG analysis (e.g. https://mne.tools/1.8/generated/mne.preprocessing.ICA.html). Yet, the fact that there seems to be a discrepancy suggests that further research is needed to better understand which type of high-pass filtering is preferable in which situation. Furthermore, it is notable that all the findings for high-pass filtering in ICA component detection and removal that we are aware of relate to ocular activity. Given that ocular and cardiac activity have very different temporal and spectral patterns it is probably worth further investigating whether the classic 1Hz high-pass filter is really also the best option for the detection and removal of cardiac activity. However, in our opinion this requires a dedicated investigation on its own..

      We therefore highlight this now in our manuscript stating that:

      “Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)).

      Winkler, S. Debener, K. -R. Müller and M. Tangermann, "On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP," 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 2015, pp. 4101-4105, doi: 10.1109/EMBC.2015.7319296.

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      Klug, M., & Gramann, K. (2021). Identifying key factors for improving ICA‐based decomposition of EEG data in mobile and stationary experiments. European Journal of Neuroscience, 54(12), 8406-8420.

      It looks like no methods were implemented to address muscle artifacts. These can affect the slope of EEG activity at higher frequencies. Perhaps the Riemannian Potato addressed these artifacts, but I suspect it wouldn't eliminate all muscle activity. As such, I would be concerned that remaining muscle artifacts affected some of the results, particularly those that included high frequency ranges in the aperiodic estimate. Perhaps if muscle activity were left in the EEG data, it could have disrupted the ability to detect a relationship between age and 1/f slope in a way that didn't disrupt the same relationship in the cardiac data (although I suspect it wouldn't reverse the overall conclusions given the number of converging results including in lower frequency bands). Is there a quick validity analysis the authors can implement to confirm muscle artifacts haven't negatively affected their results?

      I note that an analysis of head movement in the MEG is provided on page 32, but it would be more robust to show that removing ICA components reflecting muscle doesn't change the results. The results/conclusions of the following study might be useful for objectively detecting probable muscle artifact components: Fitzgibbon, S. P., DeLosAngeles, D., Lewis, T. W., Powers, D. M. W., Grummett, T. S., Whitham, E. M., ... & Pope, K. J. (2016). Automatic determination of EMG-contaminated components and validation of independent component analysis using EEG during pharmacologic paralysis. Clinical neurophysiology, 127(3), 1781-1793.

      We thank the reviewer for their suggestion. Muscle activity can indeed be a potential concern, for the estimation of the spectral slope. This is precisely why we used head movements (as also noted by the reviewer) as a proxy for muscle activity. We also agree with the reviewer that this is not a perfect estimate. Additionally, also the riemannian potato would probably only capture epochs that contain transient, but not persistent patterns of muscle activity.

      The paper recommended by the reviewer contains a clever approach of using the steepness of the spectral slope (or lack thereof) as an indicator whether or not an independent component (IC) is driven by muscle activity. In order to determine an optimal threshold Fitzgibbon et al. compared paralyzed to temporarily non paralyzed subjects. They determined an expected “EMG-free” threshold for their spectral slope on paralyzed subjects and used this as a benchmark to detect IC’s that were contaminated by muscle activity in non paralyzed subjects.

      This is a great idea, but unfortunately would go way beyond what we are able to sensibly estimate with our data for the following reasons. The authors estimated their optimal threshold on paralyzed subjects for EEG data and show that this is a feasible threshold to be applied across different recordings. So for EEG data it might be feasible, at least as a first shot, to use their threshold on our data. However, we are measuring MEG and as alluded to in our discussion section under “Differences in aperiodic activity between magnetic and electric field recordings” the spectral slope differs greatly between MEG and EEG recordings for non-trivial reasons. Furthermore, the spectral slope even seems to also differ across different MEG devices. We noticed this when we initially tried to pool the data recorded in Salzburg with the Cambridge dataset. This means we would need to do a complete validation of this procedure for the MEG data recorded in Cambridge and in Salzburg, which is not feasible considering that we A) don’t have direct access to one of the recording sites and B) would even if we had access face substantial hurdles to get ethical approval for the experiment performed by Fitzgibbon et al..

      However, we think the approach brought forward by Fitzgibbon and colleagues is a clever way to remove muscle activity from EEG recordings, whenever EMG was not directly recorded. We therefore suggested in the Discussion section that ideally also EMG should be recorded stating that:

      “It is worth noting that, apart from cardiac activity, muscle activity can also be captured in (non-)invasive recordings and may drastically influence measures of the spectral slope(72). To ensure that persistent muscle activity does not bias our results we used changes in head movement velocity as a control analysis (see Supplementary Figure S9). However, it should be noted that this is only a proxy for the presence of persistent muscle activity. Ideally, studies investigating aperiodic activity should also be complemented by measurements of EMG. Whenever such measurements are not available creative approaches that use the steepness of the spectral slope (or the lack thereof) as an indicator to detect whether or not e.g. an independent component is driven by muscle activity are promising(72,73). However, these approaches may require further validation to determine how well myographic aperiodic thresholds are transferable across the wide variety of different M/EEG devices.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) As outlined above, I recommend rephrasing the last section of the introduction to briefly summarize/introduce all main analysis steps undertaken in the study and why these were done (for example, it is only mentioned that the Cam-CAN dataset was used to study the impact of cardiac on MEG activity although the author used a variety of different datasets). Similarly, I am missing an overview of all main findings in the context of the study goals in the discussion. I believe clarifying the structure of the paper would not only provide a red thread to the reader but also highlight the efforts/strength of the study as described above.

      This is a good call! As suggested by the reviewer we now try to give a clearer overview of what was investigated why. We do that both at the end of the introduction stating that: “Using the publicly available Cam-CAN dataset(28,29), we find that the aperiodic signal measured using M/EEG originates from multiple physiological sources. In particular, significant portions of age-related changes in aperiodic activity –normally attributed to neural processes– can be better explained by cardiac activity. This observation holds across a wide range of processing options and control analyses (see Supplementary S1), and was replicable on a separate MEG dataset. However, the extent to which cardiac activity accounts for age-related changes in aperiodic activity varies with the investigated frequency range and recording site. Importantly, in some frequency ranges and sensor locations, age-related changes in neural aperiodic activity still prevail. But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging. In sum, our results highlight the complexity of aperiodic activity while cautioning against interpreting it as solely “neural“ without considering physiological influences.”

      and at the beginning of the discussion section:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources (see Figure 1EF). Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)). “

      (2) I found it interesting that the spectral slopes of ECG activity at higher frequency ranges (> 10 Hz) seem mostly related to HRV measures such as fractal and time domain indices and less so with frequency-domain indices. Do the authors have an explanation for why this is the case? Also, the analysis of the HRV measures and their association with aperiodic ECG activity is not explained in any of the method sections.

      We apologize for the oversight in not mentioning the HRV analysis in more detail in our methods section. We added a subsection to the Methods section entitled ECG Processing - Heart rate variability analysis to further describe the HRV analyses.

      “ECG Processing - Heart rate variability analysis

      Heart rate variability (HRV) was computed using the NeuroKit2 toolbox, a high level tool for the analysis of physiological signals. First, the raw electrocardiogram (ECG) data were preprocessed, by highpass filtering the signal at 0.5Hz using an infinite impulse response (IIR) butterworth filter(order=5) and by smoothing the signal with a moving average kernel with the width of one period of 50Hz to remove the powerline noise (default settings of neurokit.ecg.ecg_clean). Afterwards, QRS complexes were detected based on the steepness of the absolute gradient of the ECG signal. Subsequently, R-Peaks were detected as local maxima in the QRS complexes (default settings of neurokit.ecg.ecg_peaks; see (98) for a validation of the algorithm). From the cleaned R-R intervals, 90 HRV indices were derived, encompassing time-domain, frequency-domain, and non-linear measures. Time-domain indices included standard metrics such as the mean and standard deviation of the normalized R-R intervals , the root mean square of successive differences, and other statistical descriptors of interbeat interval variability. Frequency-domain analyses were performed using power spectral density estimation, yielding for instance low frequency (0.04-0.15Hz) and high frequency (0.15-0.4Hz) power components. Additionally, non-linear dynamics were characterized through measures such as sample entropy, detrended fluctuation analysis and various Poincaré plot descriptors. All these measures were then related to the slopes of the low frequency (0.25 – 20 Hz) and high frequency (10 – 145 Hz) aperiodic spectrum of the raw ECG.”

      With regards to association of the ECG’s spectral slopes at high frequencies and frequency domain indices of heart rate variability. Common frequency domain indices of heart rate variability fall in the range of 0.01-.4Hz. Which probably explains why we didn’t notice any association at higher frequency ranges (>10Hz).

      This is also stated in the related part of the results section:

      “In the higher frequency ranges (10 - 145 Hz) spectral slopes were most consistently related to fractal and time domain indices of heart rate variability, but not so much to frequency-domain indices assessing spectral power in frequency ranges < 0.4 Hz.”

      (3) Related to the previous point - what is being reflected in the ECG at higher frequency ranges, with regard to biological mechanisms? Results are being mentioned, but not further discussed. However, this point seems crucial because the age effects across the four datasets differ between low and high-frequency slope limits (Figure 2C).

      This is a great question that definitely also requires further attention and investigation in general (see also Tereshchenko & Josephson, 2015). We investigated the change of the slope across frequency ranges that are typically captured in common ECG setups for adults (0.05 - 150Hz, Tereshchenko & Josephson, 2015; Kusayama, Wong, Liu et al. 2020). While most of the physiological significant spectral information of an ECG recording rests between 1-50Hz (Clifford & Azuaje, 2006), meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz) that falls straight in our spectral analysis window. However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). Yet, the exact physiological mechanisms underlying so-called high-frequency QRS remain unclear (HF-QRS; see Tereshchenko & Josephson, 2015; Qiu et al. 2024 for a review discussing possible mechanisms). Yet, at the same time the HF-QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range (Schlegel et al. 2004; Qiu et al. 2024). All optimism aside, it is also worth noting that ECG recordings at higher frequencies can capture skeletal muscle activity with an overlapping frequency range up to 400Hz (Kusayama, Wong, Liu et al. 2020). We highlight all of this now when introducing this analysis in the results sections as outstanding research question stating that:

      “However, substantially less is known about aperiodic activity above 0.4Hz in the ECG. Yet, common ECG setups for adults capture activity at a broad bandwidth of 0.05 - 150Hz(33,34).

      Importantly, a lot of the physiological meaningful spectral information rests between 1-50Hz(35), similarly to M/EEG recordings. Furthermore, meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz(35)). However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). For instance, the so-called high-frequency QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range(36,37). Yet, the exact physiological mechanisms underlying the high-frequency QRS remain unclear (see (37) for a review discussing possible mechanisms). ”

      Tereshchenko, L. G., & Josephson, M. E. (2015). Frequency content and characteristics of ventricular conduction. Journal of electrocardiology, 48(6), 933-937.

      Kusayama, T., Wong, J., Liu, X. et al. Simultaneous noninvasive recording of electrocardiogram and skin sympathetic nerve activity (neuECG). Nat Protoc 15, 1853–1877 (2020). https://doi.org/10.1038/s41596-020-0316-6

      Clifford, G. D., & Azuaje, F. (2006). Advanced methods and tools for ECG data analysis (Vol. 10). P. McSharry (Ed.). Boston: Artech house.

      Qiu, S., Liu, T., Zhan, Z., Li, X., Liu, X., Xin, X., ... & Xiu, J. (2024). Revisiting the diagnostic and prognostic significance of high-frequency QRS analysis in cardiovascular diseases: a comprehensive review. Postgraduate Medical Journal, qgae064.

      Schlegel, T. T., Kulecz, W. B., DePalma, J. L., Feiveson, A. H., Wilson, J. S., Rahman, M. A., & Bungo, M. W. (2004, March). Real-time 12-lead high-frequency QRS electrocardiography for enhanced detection of myocardial ischemia and coronary artery disease. In Mayo Clinic Proceedings (Vol. 79, No. 3, pp. 339-350). Elsevier.

      (4) Page 10: At first glance, it is not quite clear what is meant by "processing option" in the text. Please clarify.

      Thank you for catching this! Upon re-reading this is indeed a bit oblivious. We now swapped “processing options” with “slope fits” to make it clearer that we are talking about the percentage of effects based on the different slope fits.

      (5) The authors mention previous findings on age effects on neural 1/f activity (References Nr 5,8,27,39) that seem contrary to their own findings such as e.g., the mostly steepening of the slopes with age. Also, the authors discuss thoroughly why spectral slopes derived from MEG signals may differ from EEG signals. I encourage the authors to have a closer look at these studies and elaborate a bit more on why these studies differ in their conclusions on the age effects. For example, Tröndle et al. (2022, Ref. 39) investigated neural activity in children and young adults, hence, focused on brain maturation, whereas the CamCAN set only considers the adult lifespan. In a similar vein, others report age effects on 1/f activity in much smaller samples as reported here (e.g., Voytek et al., 2015).

      I believe taking these points into account by briefly discussing them, would strengthen the authors' claims and provide a more fine-grained perspective on aging effects on 1/f.

      The reviewer is making a very important point. As age-related differences in (neuro-)physiological activity are not necessarily strictly comparable and entirely linear across different age-cohorts (e.g. age-related changes in alpha center frequency). We therefore, added the suggested discussion points to the discussion section.

      “Differences in electric and magnetic field recordings aside, aperiodic activity may not change strictly linearly as we are ageing and studies looking at younger age groups (e.g. <22; (44) may capture different aspects of aging (e.g. brain maturation), than those looking at older subjects (>18 years; our sample). A recent report even shows some first evidence of an interesting putatively non-linear relationship with age in the sensorimotor cortex for resting recordings(59)”

      (6) The analysis of the working memory paradigm as described in the outlook-section of the discussion comes as a bit of a surprise as it has not been introduced before. If the authors want to convey with this study that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity, I recommend introducing this analysis and the results earlier in the manuscript than only in the discussion to strengthen their message.

      The reviewer is correct. This analysis really comes a bit out of the blue. However, this was also exactly the intention for placing this analysis in the discussion. As the reviewer correctly noted, the aim was to suggest “that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity”. We placed this outlook directly after the discussion of “(neuro-)physiological origins of aperiodic activity”, where we highlight the potential challenges of interpreting drug induced changes to M/EEG recordings. So the aim was to get the reader to think about whether age is the only feature affected by cardiac activity and then directly present some evidence that this might go beyond age.

      However, we have been rethinking this approach based on the reviewers comments and moved that paragraph to the end of the results section accordingly and introduce it already at the end of the introduction stating that:

      “But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging.”

      (7) The font in Figure 2 is a bit hard to read (especially in D). I recommend increasing the font sizes where necessary for better readability.

      We agree with the Reviewer and increased the font sizes accordingly.

      (8) Text in the discussion: Figure 3B on page 10 => shouldn't it be Figure 4?

      Thank you for catching this oversight. We have now corrected this mistake.

      (9) In the third section on page 10, the Figure labels seem to be confused. For example, Figure 4 E is supposed to show "steepening effects", which should be Figure 4B I believe.

      Please check the figure labels in this section to avoid confusion.

      Thank you for catching this oversight. We have now corrected this mistake.

      (10) Figure Legend 4 I), please check the figure labels in the text

      Thank you for catching this oversight. We have now corrected this mistake.

      Reviewer #3 (Recommendations for the authors):

      I have a number of suggestions for improving the manuscript, which I have divided by section in the following:

      ABSTRACT:

      I would suggest re-writing the first sentences to make it easier to read for non-expert readers: "The power of electrophysiologically measured cortical activity decays with an approximately 1/fX function. The slope of this decay (i.e. the spectral exponent, X) is modulated..."

      Thank you for the suggestion. We adjusted the sentence as suggested to make it easier for less technical readers to understand that “X” refers to the exponent.

      Including the age range that was studied in the abstract could be informative.

      Done as suggested.

      As an optional recommendation, I think it would increase the impact of the article if the authors note in the abstract that the current most commonly applied cardiac artifact reduction approaches don't resolve the issue for EEG data, likely due to an imperfect ability to separate the cardiac artifact from the neural activity with independent component analysis. This would highlight to the reader that they can't just expect to address these concerns by cleaning their data with typical cleaning methods.

      I think it would also be useful to convey in the abstract just how comprehensive the included analyses were (in terms of artifact reduction methods tested, different aperiodic algorithms and frequency ranges, and both MEG and EEG). Doing so would let the reader know just how robust the conclusions are likely to be.

      This is a brilliant idea! As suggested we added a sentence highlighting that simply performing an ICA may not be sufficient to separate cardiac contributions to M/EEG recordings and refer to the comprehensiveness of the performed analyses.

      INTRODUCTION:

      I would suggest re-writing the following sentence for readability: "In the past, aperiodic neural activity, other than periodic neural activity (local peaks that rise above the "power-law" distribution), was often treated as noise and simply removed from the signal"

      To something like: "In the past, aperiodic neural activity was often treated as noise and simply removed from the signal e.g. via pre-whitening, so that analyses could focus on periodic neural activity (local peaks that rise above the "power-law" distribution, which are typically thought to reflect neural oscillations).

      We are happy to follow that suggestion.

      Page 3: please provide the number of articles that were included in the examination of the percentage that remove cardiac activity, and note whether the included articles could be considered a comprehensive or nearly comprehensive list, or just a representative sample.

      We stated the exact number of articles in the methods section under Literature Analysis. However, we added it to the Introduction on page 3 as suggested by the reviewer. The selection of articles was done automatically, dependent on a list of pre-specified terms and exclusively focussed on articles that had terms related to aperiodic activity in their title (see Literature Analysis). Therefore, I would personally be hesitant in calling it a comprehensive or nearly comprehensive list of the general M/EEG literature as the analysis of aperiodic activity is still relatively niche compared to the more commonly investigated evoked potentials or oscillations. I think whether or not a reader perceives our analysis as comprehensive should be up to them to decide and does not reflect something I want to impose on them. This is exacerbated by the fact that the analysis of neural aperiodic activity has rapidly gained traction over the last years (see Figure 1D orange) and the literature analysis was performed almost 2 years ago and therefore, in my eyes, only represents a glimpse in the rapidly evolving field related to the analysis of aperiodic activity.

      Figure 1E-F: It's not completely clear that the "Cleaning Methods" part of the figure indicates just methods to clean the cardiac artifact (rather than any artifact). It also seems that ~40% of EEG studies do not apply any cleaning methods even from within the studies that do clean the cardiac artifact (if I've read the details correctly). This seems unlikely. Perhaps there should be a bar for "other methods", or "unspecified"? Having said that, I'm quite familiar with the EEG artifact reduction literature, and I would be very surprised if ~40% of studies cleaned the cardiac artifact using a different method to the methods listed in the bar graph, so I'm wondering if I've misunderstood the figure, or whether the data capture is incomplete / inaccurate (even though the conclusion that ICA is the most common method is almost certainly accurate).

      The cleaning is indeed only focussed on cardiac activity specifically. This was however also mentioned in the caption of Figure 1: “We were further interested in determining which artifact rejection approaches were most commonly used to remove cardiac activity, such as independent component analysis (ICA(22)), singular value decomposition (SVD(23)), signal space separation (SSS(24)), signal space projections (SSP(25)) and denoising source separation (DSS(26)).” and in the methods section under Literature Analysis. However, we adjusted figure 1EF to make it more obvious that the described cleaning methods were only related to the ECG. Aside from using blind source separation techniques such as ICA a good amount of studies mentioned that they cleaned their data based on visual inspection (which was not further considered). Furthermore, it has to be noted that only studies were marked as having separated cardiac from neural activity, when this was mentioned explicitly.

      RESULTS:

      Page 6: I would delete the "from a neurophysiological perspective" clause, which makes the sentence more difficult to read and isn't so accurate (frequencies 13-25Hz would probably more commonly be considered mid-range rather than low or high). Additionally, both frequency ranges include 15Hz, but the next sentence states that the ranges were selected to avoid the knee at 15Hz, which seems to be a contradiction. Could the authors explain in more detail how the split addresses the 15Hz knee?

      We removed the “from a neurophysiological perspective” clause as suggested. With regards to the “knee” at ~15Hz I would like to defer the reviewer to Supplementary Figure S1. The Knee Frequency varies substantially across subjects so splitting the data at only 1 exact Frequency did not seem appropriate. Additionally, we found only spurious significant age-related variations in Knee Frequency (i.e. only one out of the 4 datasets; not shown).

      Furthermore, we wanted to better connect our findings to our MEG results in Figure 4 and also give the readers a holistic overview of how different frequency ranges in the aperiodic ECG would be affected by age. So to fulfill all of these objectives we decided to fit slopes with respective upper/lower bounds around a range of 5Hz above and below the average 15Hz Knee Frequency across datasets.

      The later parts of this same paragraph refer to a vast amount of different frequency ranges, but only the "low" and "high" frequency ranges were previously mentioned. Perhaps the explanation could be expanded to note that multiple lower and upper bounds were tested within each of these low and high frequency windows?

      This is a good catch we adjusted the sentence as suggested. We now write: “.. slopes were fitted individually to each subject's power spectrum in several lower (0.25 – 20 Hz) and higher (10-145 Hz) frequency ranges.”

      The following two sentences seem to contradict each other: "Overall, spectral slopes in lower frequency ranges were more consistently related to heart rate variability indices(> 39.4% percent of all investigated indices)" and: "In the lower frequency range (0.25 - 20Hz), spectral slopes were consistently related to most measures of heart rate variability; i.e. significant effects were detected in all 4 datasets (see Figure 2D)." (39.4% is not "most").

      The reviewer is correct in stating that 39.4% is not most. However, the 39.4% is the lowest bound and only refers to 1 dataset. In the other 3 datasets the percentage of effects was above 64% which can be categorized as “most” i.e. above 50%. We agree that this was a bit ambiguous in the sentence so we added the other percentages as well as a reference to Figure 2D to make this point clearer.

      Figure 2D: it isn't clear what the percentages in the semi-circles reflect, nor why some semi-circles are more full circles while others are only quarter circles.

      The percentages in the semi-circles reflect the amount of effects (marked in red) and null effects (marked in green) per dataset, when viewed as average across the different measures of HRV. Sometimes less effects were found for some frequency ranges resulting in quarters instead of semi circles.

      Page 8: I think the authors could make it more clear that one of the conditions they were testing was the ECG component of the EEG data (extracted by ICA then projected back into the scalp space for the temporal response function analysis).

      As suggested by the reviewer we adjusted our wording and replaced the arguably a bit ambiguous “... projected back separately” with “... projected back into the sensor space”. We thank the reviewer for this recommendation, as it does indeed make it easier to understand the procedure.

      “After pre-processing (see Methods) the data was split in three conditions using an ICA(22). Independent components that were correlated (at r > 0.4; see Methods: MEG/EEG Processing - pre-processing) with the ECG electrode were either not removed from the data (Figure 3ABCD - blue), removed from the data (Figure 2ABCD - orange) or projected back into the sensor space (Figure 3ABCD - green).”

      Figure 4A: standardized beta coefficients for the relationship between age and spectral slope could be noted to provide improved clarity (if I'm correct in assuming that is what they reflect).

      This was indeed shown in Figure 4A and noted in the color bar as “average beta (standardized)”. We do not specifically highlight this in the text, because the exact coefficients would depend on both on the analyzed frequency range and the selected electrodes.

      Figure 4I: The regressions explained at this point seems to contain a very large number of potential predictors, as I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions? (if that is not the case, it could be explained in greater detail). I'm also not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including them back into the same regression model. It seems that there could be some circularity or redundancy in doing so. However, I'm not confident that this is an issue, so would appreciate the authors explaining why it this is a valid approach (if that is the case).

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      The explanation of results for relationships between spectral slopes and aging reported in Figure 4 refers to clusters of effects, but the statistical inference methods section doesn't explain how these clusters were determined.

      The wording of “cluster” was used to describe a “category” of effects e.g. null effects. We changed the wording from “cluster” to “category” to make this clearer stating now that: “This analysis, which is depicted in Figure 4, shows that over a broad amount of individual fitting ranges and sensors, aging resulted in a steepening of spectral slopes across conditions (see Figure 4E) with “steepening effects” observed in 25% of the processing options in MEG<sub>ECG not rejected</sub> , 0.5% in MEG<sub>ECG rejected</sub>, and 60% for MEG<sub>ECG components</sub>. The second largest category of effects were “null effects” in 13% of the options for MEG<sub>ECG not rejected</sub> , 30% in MEG<sub>ECG rejected</sub>, and 7% for MEG<sub>ECG components</sub>. ”

      Page 12: can the authors clarify whether these age related steepenings of the spectral slope in the MEG are when the data include the ECG contribution, or when the data exclude the ECG? (clarifying this seems critical to the message the authors are presenting).

      We apologize for not making this clearer. We now write: “This analysis also indicates that a vast majority of observed effects irrespective of condition (ECG components, ECG not rejected, ECG rejected) show a steepening of the spectral slope with age across sensors and frequency ranges.”

      Page 13: I think it would be useful to describe how much variance was explained by the MEG-ECG rejected vs MEG-ECG component conditions for a range of these analyses, so the reader also has an understanding of how much aperiodic neural activity might be influenced by age (vs if the effects are really driven mostly by changes in the ECG).

      With regards to the explained variance I think that the very important question of how strong age influences changes in aperiodic activity is a topic better suited for a meta analysis. As the effect sizes seems to vary largely depending on the sample e.g. for EEG in the literature results were reported at r=-0.08 (Cesnaite et al. 2023), r=-0.26 (Cellier et al. 2021), r=-0.24/r=-0.28/r=-0.35 (Hill et al. 2022) and r=0.5/r=0.7 (Voytek et al. 2015). I would defer the reader/reviewer to the standardized beta coefficients as a measure of effect size in the current study that is depicted in Figure 4A.

      Cellier, D., Riddle, J., Petersen, I., & Hwang, K. (2021). The development of theta and alpha neural oscillations from ages 3 to 24 years. Developmental cognitive neuroscience, 50, 100969.

      Cesnaite, E., Steinfath, P., Idaji, M. J., Stephani, T., Kumral, D., Haufe, S., ... & Nikulin, V. V. (2023). Alterations in rhythmic and non‐rhythmic resting‐state EEG activity and their link to cognition in older age. NeuroImage, 268, 119810.

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076.

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38), 13257-13265.

      Also, if there are specific M/EEG sensors where the 1/f activity does relate strongly to age, it would be worth noting these, so future research could explore those sensors in more detail.

      I think it is difficult to make a clear claim about this for MEG data, as the exact location or type of the sensor may differ across manufacturers. Such a statement could be easier made for source projected data or in case EEG electrodes were available, where the location would be normed eg. according to the 10-20 system.

      DISCUSSION:

      Page 15: Please change the wording of the following sentence, as the way it is currently worded seems to suggest that the authors of the current manuscript have demonstrated this point (which I think is not the case): "The authors demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods."

      Apologies for the oversight! The reviewer is correct we in fact did not show this, but the authors of the cited manuscript. We correct the sentence as suggested stating now that:

      “Bénar et al. demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods.”

      Page 16: The authors mention the results can be sensitive to the application of SSS to clean the MEG data, but not ICA. I think it would be sensitive to the application of either SSS or ICA?

      This is correct and actually also supported by Figure S7, as differences in ICA thresholds affect also the detection of age-related effects. We therefore adjusted the related sentences stating now that:

      “ In case of the MEG signal this may include the application of Signal-Space-Separation algorithms (SSS(24,55)), different thresholds for ICA component detection (see Figure S7), high and low pass filtering, choices during spectral density estimation (window length/type etc.), different parametrization algorithms (e.g. IRASA vs FOOOF) and selection of frequency ranges for the aperiodic slope estimation.”

      It would be worth clarifying that the linked mastoid re-reference alone has been proposed to cancel out the ECG signal, rather than that a linked-mastoid re-reference improves the performance of the ICA separation (which could be inferred by the explanation as it's currently written).

      This is correct and we adjusted the sentence accordingly! Stating now that:

      “ Previous work(12,56) has shown that a linked mastoid reference alone was particularly effective in reducing the impact of ECG related activity on aperiodic activity measured using EEG. “

      The issue of the number of EEG channels could probably just be noted as a potential limitation, as could the issue of neural activity being mixed into the ECG component (although this does pose a potential confound to the M/EEG without ECG condition, I suspect it wouldn't be critical).

      This is indeed a very fair point as a higher amount of electrodes would probably make it easier to better isolate ECG components in the EEG, which may be the reason why the separation did not work so well in our case. However, this is ultimately an empirical question so we highlighted it in the discussion section stating that: “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      OUTLOOK:

      Page 19: Although there has been a recent trend to control for 1/f activity when examining oscillatory power, recent research suggests that this should only be implemented in specific circumstances, otherwise the correction causes more of a confound than the issue does. It might be worth considering this point with regards to the final recommendation in the Outlook section: Brake, N., Duc, F., Rokos, A., Arseneau, F., Shahiri, S., Khadra, A., & Plourde, G. (2024). A neurophysiological basis for aperiodic EEG and the background spectral trend. Nature Communications, 15(1), 1514.

      We want to thank the reviewer for recommending this very interesting paper! The authors of said paper present compelling evidence showing that, while peak detection above an aperiodic trend using methods like FOOOF or IRASA is a prerequisite to determine the presence of oscillatory activity, it’s not necessarily straightforward to determine which detrending approach should be applied to determine the actual power of an oscillation. Furthermore, the authors suggest that wrongfully detrending may cause larger errors than not detrending at all. We therefore added a sentence stating that: “However, whether or not periodic activity (after detection) should be detrended using approaches like FOOOF or IRASA still remains disputed, as incorrectly detrending the data may cause larger errors than not detrending at all(75).”

      RECOMMENDATIONS:

      Page 20: "measure and account for" seems like it's missing a word, can this be re-written so the meaning is more clear?

      Done as suggested. The sentence now states: “To better disentangle physiological and neural sources of aperiodic activity, we propose the following steps to (1) measure and (2) account for physiological influences.”

      I would re-phrase "doing an ICA" to "reducing cardiac artifacts using ICA" (this wording could be changed in other places also).

      I do not like to describe cardiac or ocular activity as artifactual per se. This is also why I used hyphens whenever I mention the word “artifact” in association with the ECG or EOG. However, I do understand that the wording of “doing an ICA” is a bit sloppy. We therefore reworded it accordingly throughout the manuscript to e.g. “separating cardiac from neural sources using an ICA” and “separating physiological from neural sources using an ICA”.

      I would additionally note that even if components are identified as unambiguously cardiac, it is still likely that neural activity is mixed in, and so either subtracting or leaving the component will both be an issue (https://doi.org/10.1101/2024.06.06.597688). As such, even perfect identification of whether components are cardiac or not would still mean the issue remains (and this issue is also consistent across a considerable range of component based methods). Furthermore, current methods including wavelet transforms on the ICA component still do not provide good separation of the artifact and neural activity.

      This is definitely a fair point and we also highlight this in our recommendations under 3 stating that:

      “However, separating physiological from neural sources using an ICA is no guarantee that peripheral physiological activity is fully removed from the cortical signal. Even more sophisticated ICA based methods that e.g. apply wavelet transforms on the ICA components may still not provide a good separation of peripheral physiological and neural activity76,77. This turns the process of deciding whether or not an ICA component is e.g. either reflective of cardiac or neural activity into a challenging problem. For instance, when we only extract cardiac components using relatively high detection thresholds (e.g. r > 0.8), we might end up misclassifying residual cardiac activity as neural. In turn, we can’t always be sure that using lower thresholds won’t result in misinterpreting parts of the neural effects as cardiac. Both ways of analyzing the data can potentially result in misconceptions.”

      Castellanos, N. P., & Makarov, V. A. (2006). Recovering EEG brain signals: Artifact suppression with wavelet enhanced independent component analysis. Journal of neuroscience methods, 158(2), 300-312.

      Bailey, N. W., Hill, A. T., Godfrey, K., Perera, M. P. N., Rogasch, N. C., Fitzgibbon, B. M., & Fitzgerald, P. B. (2024). EEG is better when cleaning effectively targets artifacts. bioRxiv, 2024-06.

      METHODS:

      Pre-processing, page 24: I assume the symmetric setting of fastica was used (rather than the deflation setting), but this should be specified.

      Indeed the reviewer is correct, we used the standard setting of fastICA implemented in MNE python, which is calling the FastICA implementation in sklearn that is per default using the “parallel” or symmetric algorithm to compute an ICA. We added this information to the text accordingly, stating that:

      “For extracting physiological “artifacts” from the data, 50 independent components were calculated using the fastica algorithm(22) (implemented in MNE-Python version 1.2; with the parallel/symmetric setting; note: 50 components were selected for MEG for computational reasons for the analysis of EEG data no threshold was applied).”

      Temporal response functions, page 26: can the authors please clarify whether the TRF is computed against the ECG signal for each electrode or sensory independently, or if all electrodes/sensors are included in the analysis concurrently? I'm assuming it was computed for each electrode and sensory separately, since the TRF was computed in both the forward and backwards direction (perhaps the meaning of forwards and backwards could be explained in more detail also - i.e. using the ECG to predict the EEG signal, or using the EEG signal to predict the ECG signal?).

      A TRF can also be conceptualized as a multiple regression model over time lags. This means that we used all channels to compute the forward and backward models. In the case of the forward model we predicted the signal of the M/EEG channels in a multivariate regression model using the ECG electrode as predictor. In case of the backward model we predicted the ECG electrode based on the signal of all M/EEG channels. The forward model was used to depict the time window at which the ECG signal was encoded in the M/EEG recording, which appears at 0 time lags indicating volume conduction. The backward model was used to see how much information of the ECG was decodable by taking the information of all channels.

      We tried to further clarify this approach in the methods section stating that:

      “We calculated the same model in the forward direction (encoding model; i.e. predicting M/EEG data in a multivariate model from the ECG signal) and backward direction (decoding model; i.e. predicting the ECG signal using all M/EEG channels as predictors).”

      Page 27: the ECG data was fit using a knee, but it seems the EEG and MEG data was not.

      Does this different pose any potential confound to the conclusions drawn? (having said this, Figure S4 suggests perhaps a knee was tested in the M/EEG data, which should perhaps be explained in the text also).

      This was indeed tested in a previous review round to ensure that our results are not dependent on the presence/absence of a knee in the data. We therefore added figure S4, but forgot to actually add a description in the text. We are sorry for this oversight and added a paragraph to S1 accordingly:

      “Using FOOOF(5), we also investigated the impact of different slope fitting options (fixed vs. knee model fits) on the aperiodic age relationship (see Supplementary Figure S4). The results that we obtained from these analyses using FOOOF offer converging evidence with our main analysis using IRASA.”

      Page 32: my understanding of the result reported here is that cleaning with ICA provided better sensitivity to the effects of age on 1/f activity than cleaning with SSS. Is this accurate? I think this could also be reported in the main manuscript, as it will be useful to researchers considering how to clean their M/EEG data prior to analyzing 1/f activity.

      The reviewer is correct in stating that we overall detected slightly more “significant” effects, when not additionally cleaning the data using SSS. However, I am a bit wary of recommending omitting the use of SSS maxfilter solely based on this information. It can very well be that the higher quantity of effects (when not employing SSS maxfilter) stems from other physiological sources (e.g. muscle activity) that are correlated with age and removed when applying SSS maxfiltering. I think that just conditioning the decision of whether or not maxfilter is applied based on the amount or size of effects may not be the best idea. Instead I think that the applicability of maxfilter for research questions related to aperiodic activity should be the topic of additional methodological research. We therefore now write in Text S1:

      “Considering that we detected less and weaker aperiodic effects when using SSS maxfilter is it advisable to omit maxfilter, when analyzing aperiodic signals? We don’t think that we can make such a judgment based on our current results. This is because it's unclear whether or not the reduction of effects stems from an additional removal of peripheral information (e.g. muscle activity; that may be correlated with aging) or is induced by the SSS maxfiltering procedure itself. As the use of maxfilter in detecting changes of aperiodic activity was not subject of analysis that we are aware of, we suggest that this should be the topic of additional methodological research.”

      Page 39, Figure S6 and Figure S8: Perhaps the caption could also briefly explain the difference between maxfilter set to false vs true? I might have missed it, but I didn't gain an understanding of what varying maxfilter would mean.

      Figure S6 shows the effect of ageing on the spectral slope averaged across all channels. The maxfilter set to false in AB) means that no maxfiltering using SSS was performed vs. in CD) where the data was additionally processed using the SSS maxfilter algorithm. We now describe this more clearly by writing in the caption:

      “Supplementary Figure S6: Age-related changes in aperiodic brain activity are most prominent on explained by cardiac components irrespective of maxfiltering the data using signal space separation (SSS) or not AC) Age was used to predict the spectral slope (fitted at 0.1-145Hz) averaged across sensors at rest in three different conditions (ECG components not rejected [blue], ECG components rejected [orange], ECG components only [green].”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Weber et al. investigate the role of 4 dopaminergic neurons of the Drosophila larva in mediating the association between an aversive high-salt stimulus and a neutral odor. The 4 DANs belong to the DL1 cluster and innervate non-overlapping compartments of the mushroom body, distinct from those involved in appetitive associative learning. Using specific driver lines, they show that activation of the DAN-g1 is sufficient to mimic an aversive memory and it is also necessary to form a high-salt memory of full strength, although optogenetic silencing of this neuron only partially affects the performance index. The authors use calcium imaging to show that the DAN-g1 is not the only one that responds to salt. DAN-c1 and d1 also respond to salt, but they seem to play no role in the assays tested. DAN-f1, which does not respond to salt, is able to lead to the formation of memory (if optogenetically activated), but it is not necessary for the salt-odor memory formation in normal conditions. However, silencing of DAN-f1 together with DAN-g1, enhances the memory deficit of DAN-g1.

      Strengths:

      The paper therefore reveals that also in the Drosophila larva as in the adult, rewards and punishments are processed by exclusive sets of DANs and that a complex interaction between a subset of DANs mediates salt-odor association.

      Overall, the manuscript contributes valuable results that are useful for understanding the organization and function of the dopaminergic system. The behavioral role of the specific DANs is accessed using specific driver lines which allow for testing of their function individually and in pairs. Moreover, the authors perform calcium imaging to test whether DANs are activated by salt, a prerequisite for inducing a negative association with it. Proper genetic controls are carried across the manuscript.

      Weaknesses:

      The authors use two different approaches to silence dopaminergic neurons: optogenetics and induction of apoptosis. The results are not always consistent, and the authors could improve the presentation and interpretation of the data. Specifically, optogenetics seems a better approach than apoptosis, which can affect the overall development of the system, but apoptosis experiments are used to set the grounds of the paper.

      The physiological data would suggest the role of a certain subset of DANs in salt-odor association, but a different partially overlapping set seems to be necessary. This should be better discussed and integrated into the author's conclusion. The EM data analysis reveals a non-trivial organization of sensory inputs into DANs and it is hard to extrapolate a link to the functional data presented in the paper.

      We would like to thank reviewer 1 for the positive evaluation of our work and for the critical suggestions for improvement. In the new version of the manuscript, we have centralized the optogenetic results and moved some of the ablation experiments to the Supplement. We also discuss in detail the experimental differences in the results. In addition, we have softened our interpretation of the specificity of memory for salt. As a result, we now emphasize more the general role of DANs for aversive learning in the larva. These changes are now also summarized and explained more simply and clearly in the Discussion, along with a revised discussion of the EM data.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors show that dopaminergic neurons (DANs) from the DL1 cluster in Drosophila larvae are required for the formation of aversive memories. DL1 DANs complement pPAM cluster neurons which are required for the formation of attractive memories. This shows the compartmentalized network organization of how an insect learning center (the mushroom body) encodes memory by integrating olfactory stimuli with aversive or attractive teaching signals. Interestingly, the authors found that the 4 main dopaminergic DL1 neurons act redundantly, and that single-cell ablation did not result in aversive memory defects. However, ablation or silencing of a specific DL1 subset (DAN-f1,g1) resulted in reduced salt aversion learning, which was specific to salt but no other aversive teaching stimuli were tested. Importantly, activation of these DANs using an optogenetic approach was also sufficient to induce aversive learning in the presence of high salt. Together with the functional imaging of salt and fructose responses of the individual DANs and the implemented connectome analysis of sensory (and other) inputs to DL1/pPAM DANs, this represents a very comprehensive study linking the structural, functional, and behavioral role of DL1 DANs. This provides fundamental insight into the function of a simple yet efficiently organized learning center which displays highly conserved features of integrating teaching signals with other sensory cues via dopaminergic signaling.

      Strengths:

      This is a very careful, precise, and meticulous study identifying the main larval DANs involved in aversive learning using high salt as a teaching signal. This is highly interesting because it allows us to define the cellular substrates and pathways of aversive learning down to the single-cell level in a system without much redundancy. It therefore sets the basis to conduct even more sophisticated experiments and together with the neat connectome analysis opens the possibility of unraveling different sensory processing pathways within the DL1 cluster and integration with the higher-order circuit elements (Kenyon cells and MBONs). The authors' claims are well substantiated by the data and clearly discussed in the appropriate context. The authors also implement neat pathway analyses using the larval connectome data to its full advantage, thus providing network pathways that contribute towards explaining the obtained results.

      Weaknesses:

      While there is certainly room for further analysis in the future, the study is very complete as it stands. Suggestions for clarification are minor in nature.

      We would like to thank reviewer 2 for the positive evaluation of our work. In fact, follow-up work is already underway to further analyze the role of the individual DL1 DANs. We have addressed the constructive and detailed suggestions for improvement in our point-by-point responses in the “Recommendations for the authors” section.

      Reviewer #3 (Public Review):

      The study of Weber et al. provides a thorough investigation of the roles of four individual dopamine neurons for aversive associative learning in the Drosophila larva. They focus on the neurons of the DL-1 cluster which already have been shown to signal aversive teaching signals. However, the authors go far beyond the previous publications and test whether each of these dopamine neurons responds to salt or sugar, is necessary for learning about salt, bitter, or sugar, and is sufficient to induce a memory when optogenetically activated. In addition, previously published connectomic data is used to analyze the synaptic input to each of these dopamine neurons. The authors conclude that the aversive teaching signal induced by salt is distributed across the four DL-1 dopamine neurons, with two of them, DAN-f1 and DAN-g1, being particularly important. Overall, the experiments are well designed and performed, support the authors' conclusions, and deepen our understanding of the dopaminergic punishment system.

      Strengths:

      (1) This study provides, at least to my knowledge, the first in vivo imaging of larval dopamine neurons in response to tastants. Although the selection of tastants is limited, the results close an important gap in our understanding of the function of these neurons.

      (2) The authors performed a large number of experiments to probe for the necessity of each individual dopamine neuron, as well as combinations of neurons, for associative learning. This includes two different training regimens (1 or 3 trials), three different tastants (salt, quinine, and fructose) and two different effectors, one ablating the neuron, the other one acutely silencing it. This thorough work is highly commendable, and the results prove that it was worth it. The authors find that only one neuron, DAN-g1, is partially necessary for salt learning when acutely silenced, whereas a combination of two neurons, DAN-f1 and DAN-g1, are necessary for salt learning when either being ablated or silenced.

      (3) In addition, the authors probe whether any of the DL-1 neurons is sufficient for inducing an aversive memory. They found this to be the case for three of the neurons, largely confirming previous results obtained by a different learning paradigm, parameters, and effector.

      (4) This study also takes into account connectomic data to analyze the sensory input that each of the dopamine neurons receives. This analysis provides a welcome addition to previous studies and helps to gain a more complete understanding. The authors find large differences in inputs that each neuron receives, and little overlap in input that the dopamine neurons of the "aversive" DL-1 cluster and the "appetitive" pPAM cluster seem to receive.

      (5) Finally, the authors try to link all the gathered information in order to describe an updated working model of how aversive teaching signals are carried by dopamine neurons to the larva's memory center. This includes important comparisons both between two different aversive stimuli (salt and nociception) and between the larval and adult stages.

      Weaknesses:

      (1) The authors repeatedly claim that they found/proved salt-specific memories. I think this is problematic to some extent.

      (1a) With respect to the necessity of the DL-1 neurons for aversive memories, the authors' notion of salt-specificity relies on a significant reduction in salt memory after ablating DAN-f1 and g1, and the lack of such a reduction in quinine memory. However, Fig. 5K shows a quite suspicious trend of an impaired quinine memory which might have been significant with a higher sample size. I therefore think it is not fully clear yet whether DAN-f1 and DAN-g1 are really specifically necessary for salt learning, and the conclusions should be phrased carefully.

      (1b) With respect to the results of the optogenetic activation of DL-1 neurons, the authors conclude that specific salt memories were established because the aversive memories were observed in the presence of salt. However, this does not prove that the established memory is specific to salt - it could be an unspecific aversive memory that potentially could be observed in the presence of any other aversive stimuli. In the case of DAN-f1, the authors show that the neuron does not even get activated by salt, but is inhibited by sugar. Why should activation of such a neuron establish a specific salt memory? At the current state, the authors clearly showed that optogenetic activation of the neurons does induce aversive memories - the "content" of those memories, however, remains unknown.

      (2) In many figures (e.g. figures 4, 5, 6, supplementary figures S2, S3, S5), the same behavioural data of the effector control is plotted in several sub-figures. Were these experiments done in parallel? If not, the data should not be presented together with results not gathered in parallel. If yes, this should be clearly stated in the figure legends.

      We would also like to thank reviewer 3 for his positive assessment of our work. As already mentioned by reviewer 1, we understand the criticism that the salt specificity for which the individual DANs are coded is not fully always supported by the results of the work. We have therefore rewritten the relevant passages, which are also cited by the reviewer. We have also included the second point of criticism and incorporated it into our manuscript. As the control groups were always measured in parallel with the experimental animals, we can also present the data together in a sub-figure. We clearly state this now in the revised figure legends.

      Summary of recommendations to authors:

      Overall, the study is commendable for its systematic approach and solid methodology. Several weaknesses were identified, prompting the need for careful revisions of the manuscript:

      We thank the reviewers for the careful revision of our manuscript. In the subsequent sections, we aim to address their concerns as thoroughly as possible. A comprehensive one-to-one listing can be found below.

      (1) The authors should reconsider their assertion of uncovering a salt-specific memory, as the evidence does not conclusively demonstrate the exclusive necessity of DAN-f1 and DAN-g1 for salt learning. In particular, the optogenetic activation of DAN-f1 leads to plasticity but this might not be salt-specific. The precise nature of the memory content remains elusive, warranting a nuanced rephrasing of the conclusions.

      We only partially agree – optogenetic activation of DANs does not really allow to comment on its salt-specificity, true. However, we used high-salt concentrations during test. Over the years, the Gerber lab nicely demonstrated in several papers that larvae recall an aversive odor-salt memory only if salt is present during test (Gerber and Hendel, 2006; Niewalda et al 2008; Schleyer et al. 2011; Schleyer et al. 2015). The used US has to be present during test. Even at the same concentration other aversive stimuli (e.g. bitter quinine) are not able to allow the larvae to recall this particular type of memory. So, if the optogenetic activation of DAN-f1 establishes a memory that can be recalled on salt, we argue that it has to encode aspects of the salt information. On the other hand, only for DAN-g1 we see the necessity for salt learning. And – although (based on the current literature) very unlikely, we cannot fully exclude that the activation of DAN-f1 establishes a yet unknown type of memory that can be also recalled on a salt plate. Therefore, we partially agree and accordingly have rephrased the entire manuscript to avoid an over-interpretation of our data. Throughout the manuscript we avoid now to use the term salt-specific memory but rather describe the type of memory as aversive memory.

      (2) A thorough examination or discussion about the potential influence of blue light aversion on behavioral observations is necessary to ensure a balanced interpretation of the findings.

      To address this point every single behavioral experiment that uses optogenetic blue light activation runs with appropriate and mandatory controls. For blue light activation experiments, two genetic controls are used that either get the same blue light treatment (effector control, w1118>UAS-ChR2XXL) or no blue light treatment (dark control, XY-split-Gal4>UAS-ChR2XXL). For blue light inactivation experiments one group is added that has exactly the same genotype but did not receive food containing retinal. These experiments show that blue light exposure itself does not induce an aversive nor positive memory and blue light exposure does not impair the establishment of odor-high salt memory. In addition, we used the latest established transgenes available. ChR2<sup>XXL</sup> is very sensitive to blue light. Only 220 lux (60 µW/cm<sup>²</sup>) were necessary to obtain stable results. In our hands – short term exposure for up to 5 minutes with such low intensities does not induce a blue light aversion. Following the advice of the reviewer, we also address this concern by adding several sentences into the related results and methods sections.

      (3) The authors should address the limitations associated with the use of rpr/hid for neuronal ablations, such as the effects of potential developmental compensation.

      We agree with this concern. It is well possible that the ablation experiments induce compensatory effects during larval development. Such an effect may be the reason for differences in phenotypes when comparing hid,rpr ablation with optogenetic inhibition. This is now part of the discussion. In addition, we evaluated if the ablation worked in our experiments. So far controls were missing that show that the expression of hid,rpr really leads to the ablation of DANs. We now added these experiments and clearly show anatomically that the DANs are ablated (related to figure 4-figure supplement 6).

      (4) While the connectome analysis offers valuable insights into the observed functions of specific DANs in relation to their extrinsic (sensory) and intrinsic (state) inputs, integrating this data more cohesively within the manuscript through careful rewriting would enhance the coherence of the study.

      We understand this concern. Therefore, the new version of our manuscript is now intensifying the inclusion of the EM data in our interpretation of the results. Throughout the entire manuscript we have now rewritten the related parts. We have also completely revised the corresponding section in the results chapter.

      (5) More generally, the authors are encouraged to discuss internal discrepancies in the results of their functional manipulation experiments.

      Thank you for this suggestion. We do of course understand that we have not given the different results enough space in the discussion. We have now changed this and have been happy to comprehensively address the concern. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions for clarification and improvement of the manuscript:

      (1) The authors should discuss why the silencing experiment with TH-GAL4 (Fig. 1) does not abolish memory formation (I assume that the PI should go to zero). Does it mean that other non-TH neurons are involved in salt-odor memory formation? Are there other lines that completely abolish this type of learning?

      Thank you very much for highlighting this crucial point. Indeed, the functional intervention does not completely eliminate the memory. There could be several reasons, or a combination thereof, for this outcome. For instance, it's plausible that the UAS-GtACR2 effector doesn't entirely suppress the activity of dopaminergic neurons. Additionally, the memory may comprise different types, not all of which are linked to dopamine function. It's also noteworthy that TH-Gal4 doesn't encompass all dopaminergic neurons – even a neuron from the DL1 cluster is absent (as previously reported in Selcho et al., 2009). Considering we're utilizing high salt concentrations in this experiment, it's conceivable that non gustatory-driven memories are formed based solely on the systemic effects of salt (e.g., increased osmotic pressure). These possibilities are now acknowledged in the text.

      (2) The Rpr experiments in Fig. 4 do not lead to any phenotype and there is a general assumption that the system compensates during development. However, there is no demonstration that Rpr worked or that development compensated for that. What do we learn from these data? Would it make sense to move it to supplement to make the story more compact? In addition: the conclusion at L 236 "DL1.... Are not individually necessary" is later disproved by optogenetic silencing. Similarly, optogenetic silencing of f1+g1 is affecting 1X and 3X learning, but not when using Rpr. Moreover, Rpr wdid not give any phenotype in other data in the supplementary material. I'm not sure how valid these results are.

      We acknowledge this concern and have actively deliberated various options for restructuring the presented ablation data. Ultimately, we reached a consensus that relocating Figure 4 to the supplement is warranted. Furthermore, corresponding adjustments have been made in the text. This decision amplifies the significance of the optogenetic results. In addition, we also addressed the other part of the concern. We examined the efficacy of hid and rpr in our experiments. Indeed, we successfully ablated specific DANs, as illustrated in the new anatomical data presented in Figure 4- figure supplement 6, which strengthens the interpretation of the hid,rpr experiments.

      (3) In most figures that show data for 1X and 3X training, there is no difference between these two conditions (I would suggest moving one set as a supplement). When a difference appears (Fig.5A-D) the implications are not discussed properly. Is it known that some circuits are necessary for the 1X but not for the 3X protocol? Is that a reasonable finding? I would expect the opposite, but I might lack of knowledge here. However, the optogenetic silencing of the same neurons in Figure 7 shows the same phenotype for 1X and 3X. Again, the validity of the Rpr experiments seems debatable.

      Different training protocols lead to different memory phases (STM and STM+ARM). We have shown that in the past in Widmann et al. 2016. Therefore, we are convinced that it makes sense to keep both data sets in the main manuscript. However, we agree that this was not properly introduced and discussed and therefore made the respective changes in the manuscript.

      (4) In Figure 3, it is unclear what the responses were tested against. Since they are so small and noisy there would be a need for a control. Moreover, in some cases, it looks like the DF/F is normalized to the wrong value: e.g. in DAN-c1 100mM, the activity in 0-10s is always above zero, and in pPAM with fructose is always below zero. This might not have any consequence on the results but should be adjusted.

      Thank you very much for your criticism, which we greatly appreciate. We have carefully re-examined the data and found that there was a mistake for the normalization of the values. We made the necessary adjustments to the evaluation, as per your suggestions. The updated figures, figure legends, and results have been incorporated into the new version of the manuscript. As noted by the reviewer, these corrections have not altered the interpretation of the data or the primary responses of the various DANs.

      (5) In the abstract: "Optogenetic activation of DAN-f1 and DAN-g1 alone suffices to substitute for salt punishment... Each DAN encodes a different aspect of salt punishment". These sentences might be misleading and an overstatement: only DAN-g1 shows a clear role, while the function of the other DANs in the context of salt-odor learning remains obscure.

      We have refined the respective part of the abstract accordingly. Consequently, we have reworded the related section, aiming to avoid any exaggeration.

      (6) The physiology is done in L1 larvae but behavior is tested in L3 larvae. There could be a change in this time that could explain the salt responses in c1 and d1 but no role in salt-odor learning?

      While we cannot dismiss the possibility of a developmental change from L1 to L3, a comparison of the anatomical data of the DL1 DANs from electron microscopy (EM) and light microscopy (LM) data indicates that their overall morphology remains consistent. However, it's important to note that this observation does not analyse the physiological aspects of these cells. Consequently, we have incorporated this concern into the discussion of the revised version of the manuscript.

      (7) The introduction needs some editing starting at L 129, as it ends with a discussion of a previously published EM data analysis. I would rather suggest stating which questions are addressed in this paper and which methods will be used and perhaps a hint on the results obtained.

      We understand the concern. We have added a concise paragraph to the conclusion of the introduction, highlighting the biological question, technical details, and a short hint on the acquired findings.

      (8) It is clear to me that the presentation of salt during the test is necessary for recall, however in L 166 I don't understand the explanation: how is the memory used in a beneficial way in the test? The salt is present everywhere and the odor cue is actually useless to escape it.

      Extensive research, exemplified by studies such as Schleyer et al. (2015) published in Elife, clearly demonstrates that the recall of odor-high salt memory occurs exclusively when tested on a high salt plate. Even when tested on a bitter quinine plate, the aversive memory is not recalled. This phenomenon is attributed to the triggering of motivation to recall the memory by the omnipresent abundance of the unconditioned stimulus (US) during the test, which in our case is high salt. Furthermore, the concentration of the stimulus plays a crucial role (Schleyer et al. 2011). The odor cue indicates where the situation could potentially be improved; however, if high salt is absent, this motivational drive diminishes as there is no memory present to enhance the already favorable situation. Additionally, the motivation to evade the omnipresent and unpleasant high salt stimulus persists throughout the entire 5-minute test period.

      (9) L288: the fact that f1 shows a phenotype in this experiment does not mean that it encodes a salt signal, indeed it does not respond to salt. It perhaps induces a plasticity that can be recalled by salt, but not necessarily linked to salt. The synergy between f1 and g1 in the salt assay was postulated based on exp with Rpr, but the validity of these experiments is dubious. I'm not sure there is sufficient evidence from Figures 6 and 7 to support a synergistic action between f1 and g1.

      It is true that DAN-f1 alone is not necessary for mediating a high salt teaching signal based on ablation, optogenetic inhibition and even physiology. However, optogenetic activation alone shows a memory tested on a salt plate. Given the logic explained above that is accepted by several publications, we would like to keep the statement. Especially as the joined activation with DAN-g1 gives rise to significant higher or lower values after joined optogenetic activation or inactivation (Figure 5E and F, Figure 6E and F in the new version). Nevertheless, we have modified the sentence. In the text we describe these effects now as “these results may suggest that DAN-f1 and DAN-g1 encode aspects of the natural aversive high salt teaching signal under the conditions that we tested”. We think that this is an appropriate and three-fold restricted statement. Therefore, we would like to keep it in this restricted version. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (10) I find the EM analysis hard to read. First of all, because of the two different graphical representations used in Fig. 8, wouldn't one be sufficient to make the point? Secondly, I could not grasp a take-home-message: what do we learn from the EM data? Do they explain any of the results? It seems to me that they don't provide an explanation of why some DL1 neurons respond to salt and others don't.

      We understand that the EM analysis is hard to read and have now carefully rewritten this part of the manuscript. See also general concern 4 above. The main take home message is not to explain why some DL1 neurons respond to salt and other do not. This cannot be resolved due to the missing information on the salt perceiving receptor cells. Unfortunately, we miss the peripheral nervous system in the EM - the first layer of salt information processing. However, our analysis shows clearly that the 4 DANs have their own identity based on their connectivity. None of them is the same – but to a certain extent similarities exist. This nicely reflects the physiological and behavioral results. We have now clarified that in the result to ease the understanding for the readership. In addition, we also clearly state that we don’t address the point why some DL1 neurons respond to salt and why others don’t respond.

      (11) Do the manipulations (activation and silencing) affect odor preference in the presence of salt? Did the authors test that the two odors do not drive different behaviors on the salty plate? Or did they only test the odor preference on plain agarose? Can we exclude a role for the DAN in driving multisensory-driven innate behavior?

      Innate odor preferences are not changed by the presence of salt or even other tastants (this work but see also Schleyer et al 2015, Figure 3, Elife). Even the naïve choice between two odors is the same if tested in the presence of different tastants (Schleyer et al 2015, Figure 3, Elife). This shows – at least for the tested stimuli and conditions – that are similar to the ones that we use – that there is no multisensory-driven innate odor-taste behavior. Therefore – at least to our knowledge - experiments as the ones suggested by the reviewer were never done in larval odor-taste learning studies. Therefore, we suggest that DAN activation has no effect on innate larval behavior. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (12) L 280: the authors generalize the conclusion to all DL1-DANs, but it does not apply to c1 and d1.

      Thanks for this comment. We deleted that sentence as suggested and thus do not anymore generalize the conclusion to all DL-DANs.

      (13) L345: I do not see the described differences in Fig. 8F, presynaptic sites of both types seem to appear in rather broad regions: could the author try to clarify this?

      We understand that the anatomical description of the data is often hard to read. Especially to readers that are not used to these kind of figures. We have therefore modified the text to ease the understanding and clarify the difference in the labeled brain regions for the broad readership.

      (14) L373: the conclusion on c1 is unsupported by data: this neuron responds to both salt and fructose (Figure 3 ) while the conclusion is purely based on EM data analysis.

      The sentence is not a conclusion but a speculation and we also list the cell's response to positive and negative gustatory stimuli. Therefore, we do not understand exactly what the reviewer means here. However, we have tried to address the criticism and have revised the sentences.

      (15) L385: the data on d1 seem to be inconsistent with Eschbach 2020, but the authors do not discuss if this is due to the differential vs absolute training, or perhaps the presence of the US during the test (which does not seem to be there in Eschbach, 2020) - is the training protocol really responsible for this inconsistency? For f1 the data seem to be consistent across these studies. The authors should clarify how the exp in Fig 6 differs from Eschbach, 2020 and how one could interpret the differences.

      True. This concern is correct. We now discuss the difference in more detail. Eschbach et al. used Cs-Crimson as a genetic tool, a one odor paradigm with 3 training cycles, and no gustatory cues in their approach. These differences are now discussed in the new version of the manuscript.

      (16) L460-475 A long part of this paragraph discusses the similarities between c1 and d1 and corresponding PPL1 neurons in the adult fly. However, c1 and d1 do not really show any phenotype in this paper, I'm not sure what we learn from this discussion and how much this paper can contribute to it. I would have wished for a discussion of how one could possibly reconcile the observed inconsistencies.

      Based on the comments of the different reviewers several paragraphs in the discussion were modified. We agree that the part on the larval-adult comparison is quite long. Thus we have shortened it as suggested by the reviewer.

      Minor corrections:

      L28 "resultant association" maybe resulting instead.

      L55 "animals derive benefit": remove derive.

      L78 "composing 12,000 neurons": composed of.

      L79 what is stable in a "stable behavioral assay"?

      L104: 2 times cluste.

      L122: "DL1 DANs are involved" in what?

      Fig. 1 please check subpanels labels, D repeats.

      L 362: "But how do individual neurons contribute to the teaching signal of the complete cluster?" I don't understand the question.

      L364 I did not hear before about the "labeled line hypothesis" in this context - could the author clarify?

      L368: edit "combinatorically".

      L390: "current suppression" maybe acute suppression.

      L 400 I'm not sure what is meant by "judicious functional configuration" and "redundancy". The functions of these cells are not redundant, and no straightforward prediction of their function can be done from their physiological response to salt.

      Thanks a lot for your in detail review of our manuscript. We welcome your well-taken concerns and have made the requested changes for all points that you have raised.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1 the reconstruction of pPAM and DL1 DANs shows the compartmentalized innervation of the larval MB. However, the images are a bit low in color contrast to appreciate the innervation well. In particular in panel B, it is hard to identify the innervated MB body structure. A schematic model of the larval MB and DAN innervation domains like in Fig. 2A would help to clarify the innervation pattern to the non-specialist.

      We understand this concern and have changed figure 1 as suggested by the reviewer. A schematic model of the MB and DANs is now presented already in figure 1 as well as the according supplemental figure.

      (2) Blue light itself can be aversive for larvae and thus interfere with the aversive learning paradigm. Does the given Illuminance (220 lux) used in these experiments affect the behavior and learning outcome?

      Yes, in former times high intensities of blue light were necessary to trigger the first generation optogenetic tools. The high intensity blue light itself was able to establish an aversive memory (e.g. Rohwedder et al. 2016). Usage of the second generation optogenetic tools allowed us to strongly reduce the applied light intensity. Now we use 220 lux (equal to 60 µW/cm<sup>2</sup>). Please note that all Gal4 and UAS controls in the manuscript are nonsignificant different from zero. The mild blue light stimulation therefore does not serve as a teaching signal and has neither an aversive nor an appetitive effect. Furthermore, we use this mild light intensity for several other behavioral paradigms (locomotion, feeding, naïve preferences) and have never seen an effect on the behavior.

      (3) Fig.2: Except for MB054B-Gal4 only the MB expression pattern is shown for other lines. Is there any additional expression in other cells of the brain? In the legend in line 761, the reporter does not show endogenous expression, rather it is a fluorescent reporter signal labeling the mushroom body.

      The lines were initially identified by a screen on larval MB neurons done together with Jim Truman, Marta Zlatic and Bertram Gerber. Here full brain scans were always analyzed. These images can be seen in Eschbach et al. 2020, extended figure 1. Neither in their evaluation nor in our anatomical evaluation (using a different protocol) additional expression in brain cells was detectable. We also modified the figure legend as suggested.

      (4) Fig.3: Precise n numbers per experiment should be stated in the figure legend.

      True, we now present n numbers per experiment whenever necessary.

      (5) Fig.4: Have the authors confirmed complete ablation of the targeted neuron using rpr/hid? Ablations can be highly incomplete depending on the onset and strength of Gal4 expression, leaving some functionality intact. While the ablation experiments are largely in line with the acute silencing of single DANs during high salt learning performed later on (Fig.7), there is potentially an interesting aspect of developmental compensation hidden in this data. Not a major point, but potentially interesting to check.

      We agree with this criticism. We have not tested if the expression of hid,rpr in DL1 DANs does really ablate them. Therefore we did an additional experiment to show that. The new data is now present as a supplemental figure (Figure 4- figure supplement 6). The result shows that expression of hid,rpr ablates also DL1 DANs similar to earlier experiments where we used the same effectors to ablate serotoniergic neurons (Huser et al., 2012, figure 5).

      (6) The performance index in Fig. 4 and 5 sometimes seems lower and the variability is higher than in some of the other experiments shown. Is this due to the high intrinsic variability of these particular experiments, or the background effects of the rpr/hid or splitGal4 lines?

      The general variability of these experiments is within the expected and known borders. In these kind of experiments there is always some variation due to several external factors (e.g. experimental time over the year). Therefore it is always important to measure controls and experimental animals at the same time. Of course that’s what we did and we only compare directly results of individual datasets. But not between different datasets. This is further hampered given that the experiments of Figure 4 (now Figure 4- figure supplement 1) and Figure 5 (now Figure 4) differ in several parameters from other learning experiments presented later in the text. Optogenetic activation uses blue light stimulation instead of “real world” high salt. Most often direct activation of specific DANs in the brain is more stable than the external high salt stimulation. Also optogenetic inactivation uses blue light stimulation and also retinal supplemented food. Both factors can affect the measurement. We thus want to argue that it is for each experiment most often the particular parameters that affect the variability of the results rather than background effects of the rpr/hid and split-Gal4 lines.

      (7) Fig.7: This is a neat experiment showing the effects of acute silencing of individual DL1 DANs. As silencing DAN-f1/g1 does not result in complete suppression of aversive learning, it would be highly interesting to test (or speculate about) additive or modulatory effects by the other DANs. Dan-c-1/d-1 also responds to high salt but does not show function on its own in these assays. I am aware that this is currently genetically not feasible. It would however be a nice future experiment.

      True, we were intensively screening for DL1 cluster specific driver lines that cover all 4 DL1 neurons or other combinations than the ones we tested. Unfortunately, we did not succeed in identifying them. Nevertheless, we will further screen new genetic resources (e.g. Meissner et al., 2024, bioRxiv) to expand our approach in future experiments. Please also see our comment on concern 1 of reviewer 1 for further technical limitations and biological questions that can also potentially explain the absence of complete suppression of high salt learning and memory. Some of these limitations are now also mentioned and discussed in the new version of the manuscript.

      (8) The discussion is excellent. I would just amend that it is likely that larval DAN-c1, which has high interconnectivity within the larval CNS, is likely integrating state-dependent network changes, similar to the role of some DANs in innate and state-dependent preference behavior. This might contribute to modulating learned behavior depending on the present (acute) and previous environmental conditions.

      Thanks a lot for bringing this up. We rewrote this part and added a discussion on recent work on DAN-c1 function in larvae as well as results on DAN function in innate and state-dependent preference behavior.

      (9) Citation in line 1115 missing access information: "Schnitzer M, Huang C, Luo J, Je Woo S, Roitman L, et al. 2023. Dopamine signals integrate innate and learned valences to regulate memory dynamics. Research Square".

      Unfortunately this escaped our notice. The paper is now published in Nature: Huang, C., Luo, J., Woo, S.J. et al. Dopamine-mediated interactions between short- and long-term memory dynamics. Nature 634, 1141–1149 (2024). https://doi.org/10.1038/s41586-024-07819-w. We have now changed the citation. The new citation includes the missing access information.

      Reviewer #3 (Recommendations For The Authors):

      Regarding my issue about salt specificity in the public review, I want to make clear that I do not suggest additional experiments, but to be very careful in phrasing the conclusions, in particular whenever referring to the experiments with optogenetic activation. This includes presenting these experiments as "(salt) substitution" experiments - inferring that the optogenetic activation would substitute for a natural salt punishment. As important and interesting as the experiments are, they simply do not allow such an interpretation at this point.

      Results, line 140ff: When presenting the results regarding TH-Gal4 crossed to ChR2-XXL, please cite Schroll et al. 2006 who demonstrated the same results for the first time.

      Thanks for mentioning this. We now cite Schroll et al. 2006 here in the text of the manuscript.

      Figure 3: The subfigure labels (ABC) are missing.

      Unfortunately this escaped our notice. Thanks a lot – we have now corrected this mistake.

      Figure 5: For I and L, it reads "salt replaced with fru", but the sketch on the left shows salt in the test. I assume that fructose was not actually present in the test, and therefore the figure can be misleading. I suggest separate sketches. Also, I and L are not mentioned in the figure legend.

      True, this is rather confusing. Based on the well taken concern we have changed the figure by adding a new and correct scheme for sugar reward learning that does not symbolize fructose during test.

      Figure S1: The experimental sketches for E,F and G,H seem to be mixed up.

      We thank the reviewer for bringing this up. In the new version we corrected this mistake.

      Figure S5: There are three sub-figures labelled with B. Please correct.

      Again, thanks a lot. We made the suggested correction in Figure S5.

      Discussion, line 353ff: this and the following sentences can be read as if the authors have discovered the DL-1 neurons as aversive teaching mediators in this study. However, Eschbach et al. 2020 already demonstrated very similar results regarding the optogenetic activation of single DL-1 DANs. I suggest to rephrase and cite Eschbach et al. 2020 at this point.

      That is correct. Our focus was on the gustatory pathway. The original discovery was made by Eschbach et al. We have now corrected this in the discussion and clarified our contribution. It was never our intention to hide this work, as the laboratory was also involved. Nevertheless, this is an annoying omission on our side.

      Line 385-387: this sentence is only correct with respect to Eschbach et al. 2020. Weiglein et al. 2021 used ChR2-XXL as an effector, but another training regimen.

      We understand this criticism. Therefore, we changed the sentence as suggested by the reviewer. See also our response on concern 15 of reviewer 1.

      Line 389ff: I do not understand this sentence. What is meant by persistent and current suppression of activity? If this refers to the behavioural experiments, it is misleading as in the hid, reaper experiments neurons are ablated and not suppressed in activity.

      We made the requested changes in the text. It is true that the ablation of a neuron throughout larval life is different from constantly blocking the output of a persisting neuron.

      Methods, line 615 ff: the performance index is said to be calculated as the difference between the two preferences, but the equation shows the average of the preferences.

      Thanks a lot. We are sorry for the confusion. We have carefully rewritten this part of the methods section to avoid any misunderstanding.

      When discussing the organization of the DL1 cluster, on several occasions I have the impression the authors use the terms "redundant" and "combinatorial" synonymously. I suggest to be more careful here. Redundancy implies that each DAN in principle can "do the job", whereas combinatorial coding implies that only a combination of DANs together can "do the job". If "the job" is establishing an aversive salt memory, the authors' results point to redundancy: no experimental manipulation totally abolished salt learning, implying that the non-manipulated neurons in each experiment sufficed to establish a memory; and several DANs, when individually activated, can establish an aversive memory, implying that each of them indeed can "do the job".

      Based on this concern we have rewritten the discussion as suggested to be more precise when talking about redundancy or combinatorial coding of the aversive teaching signal. Basically, we have removed all the combinatorial terms and replaced them by the term “redundancy”.

      The authors mix parametric and non-parametric statistical tests across the experiments dependent on whether the distribution of the data is normal or not. It would help readers if the authors would clearly state for which data which tests were used.

      We understand the criticism and now have added an additional supplemental file that includes all the information on the statistical tests applied and the distribution of the data.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study experimentally examined diet-microbe-host interactions through a complex systems framework, centered on dietary oxalate. Multiple, independent molecular, animal, and in vitro experimental models were introduced into this research. The authors found that microbiome composition influenced multiple oxalate-microbe-host interfaces. Oxalobacter formigenes were only effective against a poor oxalate-degrading microbiota background and give critical new insights into why clinical intervention trials with this species exhibit variable outcomes. Data suggest that, while heterogeneity in the microbiome impacts multiple diet-host-microbe interfaces, metabolic redundancy among diverse microorganisms in specific diet-microbe axes is a critical variable that may impact the efficacy of bacteriotherapies, which can help guide patient and probiotic selection criteria in probiotic clinical trials.

      Thank you. The main message of this research, is that through complex modelling, we believe we have identified the critical variable (metabolic redundancy) that is responsible for the efficacy of probiotics designed to reduce oxalate levels, thus allowing for improved patient selection in clinical trials. We also believe that this process and the critical features identified can be translated to other critical microbial functions such as short chain fatty acid synthesis, secondary bile acid synthesis, and others.

      Strengths:

      The paper has made significant progress in both the depth and breadth of scientific research by systematically comparing multiple experimental methods across multiple dimensions. Particularly through in-depth analysis from the enzymatic perspective, it has not only successfully identified several key strains and redundant genes, which is of great significance for understanding the functions of enzymes, the characteristics of strains, and the mechanisms of genes in microbial communities, but also provided a valuable reference for subsequent experimental design and theoretical research.

      More importantly, the establishment of a novel research approach to probiotics and gut microbiota in this paper represents a major contribution to the current research field. The proposal of this new approach not only breaks through the limitations of traditional research but also offers new perspectives and strategies for the screening, optimization of probiotics, and the regulation of gut microbiota balance. This holds potential significant value for improving human health and the prevention and treatment of related diseases.

      Thank you for the comments. We believe that the approach taken here, which contrasts with conventional reductionist techniques, will be critical for translating gut microbiome research into actionable therapeutic approaches.

      Weaknesses:

      While the study has excellently examined the overall changes in microbial community structure and the functions of individual bacteria, it lacks a focused investigation on the metabolic cross-feeding relationships between oxalate-degrading bacteria and related microorganisms, failing to provide a foundational microbial community or model for future research. Although this paper conducts a detailed study on oxalate metabolism, it would be beneficial to visually present the enrichment of different microbial community structures in metabolic pathways using graphical models.

      Thank you for this critique.  In the current study, we broadly examined the response of the gut microbiota to dietary oxalate. Based on initial shotgun metagenomic results, we focused in on specific taxa and metabolic functions.  Through metagenomic and multiple culture-based studies, we quickly honed in on redundancy in oxalate-degrading function as a key feature for oxalate homeostasis. We believe that the defined microbial community we used for microbial transplants (particularly the taxonomic cohort) provides a strong, minimal community to explore oxalate homeostasis further. In fact, we are using this consortium in multiple follow-up studies to fully understand the cross-feeding that may occur among these microorganisms, as you suggest.  We note that figure 3 shows the change of species and metabolic pathways with oxalate exposure.   

      Furthermore, the authors have done a commendable job in studying the roles of key bacteria. If the interactions and effects of upstream and downstream metabolically related bacteria could be integrated, it would provide readers with even more meaningful information. By illustrating how these bacteria interact within the metabolic network, readers can gain a deeper understanding of the complex ecological and functional relationships within microbial communities. Such an integrated approach would not only enhance the scientific value of the study but also facilitate future research in this area.

      Thank you. We note that based on the collective data obtained in this study, that redundancy in the oxalate degradation is the critical feature that maintains oxalate homeostasis. However, we are interested potential metabolic interactions between microbes in our defined community and are currently investigating these interactions through extensive investigations.   

      Reviewer #2 (Public review):

      Summary:

      Using the well-studied oxalate-microbiome-host system, the authors propose a novel conceptual and experimental framework for developing targeted bacteriotherapies using a three-phase pre-clinical workflow. The third phase is based on a 'complex system theoretical approach' in which multi-omics technologies are combined in independent in vivo and in vitro models to successfully identify the most pertinent variables that influence specific phenotypes in diet-host-microbe systems. The innovation relies on the third phase since phase I and phase II are the dominant approaches everyone in the microbiome field uses.

      Thank you. As you note, the proposed phases I and II are the predominant approaches used. In fact, many clinical trials have been conducted to try and reduce urine oxalate in patients, based solely on mechanistic studies with Oxalobacter formigenes.  As noted in our manuscript, only 43% of those studies results in the intended outcome, necessitating the approach we took in the current study. Our results suggest that the reason for the high rate of failure, despite well established mechanisms, is due to insufficient patient selection that focused only on the presence or absence of O. formigenes, which is a species that exhibits very low prevalence and abundance in the human gut microbiota, normally.

      Strengths:

      The authors used a multidisciplinary approach which included:

      (1) fecal transplant of two distinct microbial communities into Swiss-Webster mice (SWM) to characterize the host response (hepatic response-transcriptomics) and microbial activity (untargeted metabolomics of the stool samples) to different oxalate concentrations;

      (2) longitudinal analysis of the N. albigulia gut microbiome composition in response to varying concentrations of oxalate by shotgun metagenomics, with deep bioinformatic analyses of the genomes assembled; and

      (3) development of synthetic microbial communities around oxalate metabolisms and evaluation of these communities' activity in oxalate degradation in vivo.

      Thank you for these comments.  In the complex modelling approach, we focused on complete microbiota from host species known to have high and low capacities for oxalate tolerance, combined with targeting specific metabolic functions vs. specific taxa that may include unknown functions important for oxalate metabolism.  Further, we examined the influence of our target communities on oxalate metabolism through multiple in vitro and in vivo studies.

      Weaknesses:

      However, I have concerns about the frame the authors tried to provide for a 'complex system theoretical approach' and how the data are interpreted within this frame. Several of the conclusions the authors provide do not seem to have sufficient data to support them.

      Thank you.  We have tried to address these concerns by adding an exhaustive figure that broadly represents our complex modelling approach that includes potential complex system-based hypotheses, how they were tested, and the host-microbiome-oxalate interactions found in our study.

      Recommendations for the authors:  

      Reviewer #2 (Recommendations for the authors):

      Major Concerns

      (1) The authors argue about the importance of bringing 'Complex System Theory' to the microbiome field systematically and consistently. However, the authors fail to introduce this theory throughout the entire manuscript. For example, the authors tried to describe key elements and their nomenclature, such as nodes and fractal layers, in the first part of the result section. But the description is wordy and not precise. It would be more useful if the authors connected the model description with a visual representation, such as a figure. Unfortunately, these elements are not emphasizing and carried across the results section and are not mentioned in the discussion section.

      We have now added a figure (Figure 7) that details this process extensively and ties each of our findings to the complex system model and nomenclature.  We have also reiterated how our results fit in the complex system model in the discussion.

      In addition, there is no straightforward approach to integrating multi-omics datasets to identify the variables that are determinants of the system. For example, Figure 1 focuses on the impact of the host, hepatic activity, to oxalate exposure on fecal transplants into Swiss Webster mice; Figure 2 focuses on the effects of oxalate exposure on stool metabolic activity, not only microbial metabolic activity, on fecal transplants into Swiss Webster mice; and Figure 3 focuses on microbiome responses to different oxalate concentration in Neotoma albigula. There is no "model" to really integrate the host, the microbiome activity, and the microbiome composition information. And, unfortunately, the data generated between experiments cannot directly integrate; see major concern # 2.

      Thank you.  We have made more clear the experimental approach and how it applied to understanding the critical factors that maintain oxalate homeostasis.  Specifically, Figure 1 established that the effect of oxalate on the host was dependent on the microbiota, rather than host genetics.  Figure 2 established the effect of oxalate on the gut microbiota was again dependent on the whole gut microbiota and that these oxalate-microbe effects also influenced oxalate-host effects through a direct multi-omic data integration.  Once we established that the oxalate effects on host and microbiota were dependent on the whole microbiota composition, Figure 3 then sought to figure out how oxalate impacted the gut microbiota, using our model of high oxalate tolerance (N. albigula). With the finding in Figure 3 that there were multiple genes attributed to the degradation of oxalate, or acetogenic, methanogenic, and sulfate reducing pathways, Figure 4 and relevant supplemental figures sought to quantify the redundancy of these pathways.  After establishing a very high degree of redundancy, we sought to use a culturomic approach to determine what environmental factors impacted oxalate metabolism and to evaluate oxalate metabolism using our defined, hypothesized communities of microorganisms.  Finally, figure 6 sought to validate our metagenomic, metabolomic, and culturomic results from multiple animal and in vitro models using targeted microbial transplants in mice.  While we did have some direct multi-omic data integration (Figures 2 and 3), the process employed here sought to systematically determine which factors were most important for the oxalate-microbiota-host relationship, and then to use those results to design the subsequent experiments.  We have added this description to the discussion, which helps to contextualize the complex system modelling approach we took here.

      Finally, the authors did not provide a novel variable that successfully influences oxalate degradation in the oxalate-microbiome-host system. The authors argue that "both resource availability and community composition impact oxalate metabolism," which we currently inferred by the failure of the clinical tries and do not provide a clear intervention strategy to develop functional bacteriotherapy. The identification of composition as an important variable that was predictable without any multi-omics approach was highlighted by the development of synthetic microbial communities. Synthetic microbial communities are critical to characterizing complex microbiomes. Still, the authors did not explain how this strategy can be used in their theoretical framework (that is their goal), and these communities are not well introduced across the manuscript; see major concern # 4.

      As stated, it is clear from the failed clinical trials that we do not fully understand what microbial features dictate oxalate homeostasis.  We have specifically identified, through fecal transplant studies, that microbial composition is critical for oxalate homeostasis and that diverse oxalate-degrading bacteria exist.  However, ours is the first study that explicitly shows that it is this diversity that controls oxalate homeostasis.  This is specifically ascertained through the targeted microbial transplants in mice whereby O. formigenes was given alone or with different combinations of other microorganisms.  In other words, we were able to replicate both successful and failed studies by manipulating which specific species were introduced into animals.  This is unprecedented in the literature.

      (2) The authors provide several conclusions that are not completely supported by the data available. For example:

      (a) Lines 236-239: "Within the framework of complex systems, results show microbe-host cooperation whereby oxalate effectively processed within the SW-NALB gut microbiota reduced overall liver activity, indicative of a beneficial impact." - The authors did not provide data related to oxalate levels of oxalate processing for this dataset.

      While we did not specifically quantify oxalate degradation for this specific study, as cited in the text when describing this Swiss-Webster, Neotoma albigula system, we have previously published multiple animal studies explicitly showing that the N. albigula animals were highly effective oxalate degraders, which is transferable to Swiss-Webster mice through fecal transplants. Since the gut microbiota’s impact on oxalate has been welll established through experiments by our group, the purpose of these specific experiments were to look the other way and examine the effect of oxalate on the gut microbiota of these two animal models.  In the referenced text, we again cited our studies showing that the SW-NALB system effectively degrades oxalate.

      (b) Lines 239-243: "Data also suggest that both the gut microbiota and the immune system are involved in oxalate remediation (redundancy), such that if oxalate cannot be neutralized in the gut microbiota or liver, then the molecule will be processed through host immune response mechanisms (fractality), in this case indicated through an overall increase in hepatic activity and specifically in mitochondrial activity." - The authors did not provide any evidence related to the immune system and oxalate metabolism.

      We corrected that statement as follows: “…in this case indicated through an overall increase in inflammatory cytokines with oxalate exposure combined with an ineffective oxalate-degrading microbiota (Figures S6a,b; S9a,b).”  In other words, if the liver and gut microbiota can’t eliminate a toxin, then the immune system must deal with it through inflammatory pathways.  Oxalate is a well established, pro-inflammatory compound.  Our data show that this is dependent on the gut microbiota.

      (c) Lines 250-252: "Following the diet trial, colon stool was collected post-necropsy and processed for untargeted metabolomics, which is a measure of total microbial metabolic output." - Although most metabolites in stool samples are indeed microbial, there are also host metabolites. So, it is not technically correct to relate the metabolomic analysis of stool samples to only microbial metabolic analysis. In addition, the authors discussed compounds such as alkaloids and cholesterol as microbial metabolites, which these compounds are more related to the diet and host correspondingly.

      We have corrected this to state: “total metabolites present in stool from the diet, microbial activity, and host activity”

      (d) Lines 270-273. "Specifically, the SW-NALB mice exhibit hallmarks of homeostatic feedback with oxalate exposure to maintain a consistent metabolic output, defined by the relatively small, net negative, microbial metabolite-hepatic gene network compared to the large, net positive, network of SW-SW mice." - How do the authors define oxalate homeostasis? In addition, do the authors imply feedback between the liver and the microbiome in which the microbiome responds to a liver response related to oxalate levels? Or could the observation in Figure 1 be explained just by microbial consumption of oxalate that would reduce the impact of oxalate that arrives at the liver?

      Oxalate homeostasis is defined in that sentence: “relatively small, net negative, microbial metabolite-hepatic gene network compared to the large, net positive, network of SW-SW mice” – in other words, for SW-NALB mice, oxalate did not produce a considerable change to either microbial or hepatic metabolic activity.  We did not really test the liver impact on gut microbiota and can’t speak to that.  We believe, based on Figure 2 data, that it is not just the degradation of oxalate that explains the lack of change in hepatic activity in SW-NALB mice, rather that the oxalate-induced shift in the gut microbiota metabolic activity broadly altered hepatic activity, as inferred from Figure 2 c.  We made this more clear in the results: “suggests that the oxalate-induced change in microbial metabolism is responsible for the change in hepatic activity”.

      (e) Lines 297-301: "The oxalate-dependent metagenomic divergence of the NALB gut microbiota (Figure 3), combined with the lack of change in the microbial metabolomic profile with oxalate exposure (Figure 2), suggest that oxalate stimulates taxonomically diverse, but metabolically redundant microorganisms, in support of maintaining homeostasis." - The authors cannot conclude anything related between taxonomic changes and microbial activity since the taxonomic data presented is for microbial enrichment in N. albigulia, and the "microbial activity data" is from the fecal transplantation experiment in SWM. These are two completely different systems with two completely different experimental designs.

      We have shown very similar results in that oxalate induces the taxonomic divergence for the NALB gut microbiota, in multiple previous studies.  The experiment in which a minimal, positive increase in microbial metabolites, was saw with oxalate was based on the SW-NALB model whereby Swiss-Webster mice have an NALB microbiota.  We show throughout the manuscript, that the impact of oxalate is very microbiota dependent and supports our claim.  However, the claim is hypothesis generating – that metabolic redundancy is important for oxalate homeostasis.  We modified our statement to make all of this more clear.   

      Related to microbial composition, the authors did not show data validating the efficiency of the fecal transplantations (allograft or xenograft) in the SWM after antibiotic treatment. They also did not show evidence of microbial composition dynamics in response to oxalate exposure.

      Again, the efficacy of fecal transplants, used in the way they were here, has been shown in multiple past studies of our group.  In past studies, we have extensively characterized the microbiota from fecal transplants and which taxa were associated with oxalate levels.  Therefore, that topic was not the focus of the current study, instead focusing on the oxalate impact on gut microbiota activity.  Our past studies, referenced multiple times through the current manuscript, were used in large part to help determine which microbes to include in our taxonomic cohort, as described in the manuscript.

      (f) Lines 301-303: "Given that data came from the same hosts sampled longitudinally, these data also reflect a microbiota that is adaptive to oxalate exposure, which is another important characteristic of complex systems." - In their dataset, what is the evidence that the microbiota of N. albigulia is adapted to oxalate exposure? Is the increase in genomes with pathways related to oxalate metabolism related to an increase of oxalate in the diet? If so, does the microbiota exposure with a higher oxalate concentration decrease the systemic level of oxalate? In neither of the experiments related to Figures 1 to 3, the authors showed a correlation of systemic oxalate levels with microbial composition, hepatic host response, or stool metabolism.

      Figure 3 explicitly shows the longitudinal impact of increasing levels of oxalate showing an increase in oxalate degrading genes (Figure 3d). The specific samples selected for analysis here come from a previous study in which we explicitly quantified changes to the gut microbiota composition and both stool and urine oxalate for every time point listed in figure 3a.  This information is explicitly stated in the methods coupled with the fact that “neither fecal nor urinary oxalate levels increased significantly.”  Again, the effect of the gut microbiota on oxalate in these model systems have been extensively studied by our group and provide the foundation for the current study to look at the effect of oxalate on the gut microbiota and host.

      Considering my last two points, the authors do not present substantial evidence to support their hypothesis that oxalate stimulates taxonomically diverse, metabolically redundant communities.

      As stated above, that oxalate stimulates taxonomically diverse taxa was ascertained through multiple past studies, as well as the current study (Figure 3e).  The metabolically redundant part is ascertained both through untargeted metabolomics (Figure 2a,b) and shotgun metagenomics (Figure 3c,d).  Further evidence for the metabolic redundancy with oxalate comes from our culturomic approach, which showed that 14.58% of isolates could grow on oxalate as a carbon and energy source, in addition to the high proportion of isolates that could grow on other carbon and energy sources, at least much more than can be ascribed to a single species  (Figure 5c).  We made this more clear in the discussion.

      (g) Lines 330-335. "Additionally, the broad diversity of species that contain oxalate-related genes suggests that the distribution of metabolic genes is somewhat independent of the distribution of microbial species, which suggests that microbial genes exist in an autonomous fractal layer, to some degree. This hypothesis is supported by studies which show a high degree of horizontal gene transfer within the gut microbiota as a means of adaptation." - This conclusion is highly speculative, especially since the author did not do any analysis to directly evaluate a relationship between the oxalate metabolic pathways and the microbial species where these pathways are present.

      Figure 3c,d,e explicitly shows the metabolic pathways and species enriched by oxalate exposure.  Figure 4d, generated using the same data from Figure 3, explicitly shows the taxa that harbor oxalate-degrading genes.   

      (h) Lines 364-366. "Collectively, data show that both resource availability and community composition impacts oxalate metabolism, which helps to define the adaptive nature of the NALB gut microbiota." - The authors indeed showed evidence that community composition impacts oxalate metabolism. However, the authors did not show any evidence to directly evaluate the resource availability to impact oxalate metabolism.

      This is explicitly shown through in vitro community-based and single species assays varying multiple different carbon and energy sources to quantify changes to oxalate degradation (chosen based on shotgun metagenomic results; Figure 5a,b).

      (3) Lines 321-325. "Acetogenic genes were also present in 97.18% of genomes, dominated by acetate kinase and formate-tetrahydrofolate ligase (Figure S3A323C). Methanogenic genes were present in 100% of genomes, dominated by phosphoserine phosphatase, atpdependent 6-phosphofructokinase, and phosphate acetyltransferase (Figure S4A-C)." - The authors spent much time analyzing the adjacent pathways related to oxalate and oxalaterelated products of oxalate metabolism. However, my understanding is that the genes used to analyze these pathways (formate metabolism, acetogenesis, methanogenesis), such as the ones named above, are not unique/specific for those pathways but participate in other "housekeeping" pathways. What is the relevance of these analyses when those genes are not unique/specific to the function/pathways that the authors describe? If I infer correctly, these bioinformatic analyses aim to evaluate the hypothesis of whether oxalate metabolism could be a social/cooperation metabolism and whether other species could participate in the metabolism of oxalate subproducts. However, these analyses did not explicitly evaluate this hypothesis.

      The reviewer is correct in that we aimed to evaluate the potential that oxalate metabolism could benefit from metabolic cooperation.  The specific genes chosen for this analysis were those explicitly listed in the target metabolic pathways in KEGG, as described.  However, while the analyses do show the strong potential that the CO2 and formate produced from oxalate degradation could be used in these other pathways, as intended, the genes can be used in other metabolic pathways.  We did, however, explicitly test the hypothesis that formate, produced from oxalate degradation, could be utilized by the gut microbiota.  While the targeted transplants with the taxonomic cohort did not clearly show the use of formate in this way, those from the metabolic cohort did (Figures 6d and S8d).  This question is still in ongoing investigations in our group.  

      We have made it more clear that our genome analyses provide the potential for metabolic redundancy rather than definitive proof for metabolic redundancy, which was evaluated more extensively in other experiments from this study.

      (a) Lines 481-484. "Collectively, data offer strong support for the hypothesis that metabolic redundancy among diverse taxa, is the primary driver of oxalate homeostasis, rather than metabolic cooperation in which the by-products of oxalate degradation are used in downstream pathways such as acetogenesis, methanogenesis, and sulfate reduction." - Although the authors recognize that their data about the metabolic cooperation hypothesis is inconclusive, they never tested the hypothesis related to metabolic cooperation, as mentioned above. This is highly speculative.

      As stated above, the targeted microbial transplants to animals and in vitro studies (Figure 5e,f) did explicitly test the cooperation hypothesis, but it the results did not support it and instead pointed much more strongly to metabolic redundancy.    

      (4) Lines 355-359. "Cohorts, defined in the STAR methods, were used to delineate hypotheses that either carbon and energy substrates are sufficient to explain known effects of the oxalate-degrading microbial network or that additional aspects of taxa commonly stimulated by dietary oxalate are required to explain past results (taxa defined through previous meta-analysis of studies)." - The definition of the metabolic cohorts and the taxonomic cohorts should not be hidden in the material and methods section. It should be explicit and clearly explained in the main text. Related, the table presented in Figure 5D is exceptionally confusing and does not help to understand and differentiate between the metabolic and the taxonomic cohorts. The authors need to explicitly identify the synthetic communities used in each cohort and each group by their members and their characteristics in supplementary tables.

      In the sentences before those referenced, we state: “Culturomic data recapitulates molecular data to show a considerable amount of redundancy surrounding oxalate metabolism (Fig. 5C). Isolates generated from this assay were used for subsequent study (metabolic cohort; Figure 5D). Additionally, a second cohort was defined and commercially purchased based both on known metabolic functions and the proportion of studies that saw an increase in their taxonomic population with oxalate consumption (Fig. 5D; taxonomic cohort). Where possible, isolates from human sources were obtained.”  Figure 5d explicitly shows the specific species used in each cohort along with the groups they were in for transplant studies, the explicit metabolic pathways we were targeting, along with the % of studies that these species were associated with oxalate metabolism.  All of this information is both in the main text of the results and in the figure legends.  It is not hidden in the methods, but the methods do reiterate what was also placed in the results.   

      In Figures 5 and 6, the authors used the following groups with the corresponding nomenclature: 'Group 1, No_bact; Group 2, Ox; Group 3, Ox_form; Group 4, All; Group 5, No_ox'. Although the information related to these groups is present in the material and method section in lines 1139-1143, the authors also need to explicitly explain the groups and their nomenclature in the main text.

      Since this information is explicitly and succinctly given in the referenced figures, I believe that adding the same information in the text would be too redundant.

      Related to the development of the synthetic communities. How did the authors prepare the synthetic communities or 'cohort' for the in vitro experiments? 

      We added more information for the preparation of microbes and execution of the in vitro assays, as needed.  

      Also, it is unclear in the material and method section how the metabolic profile of each isolated was evaluated (Figure 5C). Related to the bacteria isolated from the culturomic assays, including Figure 5C and metabolic cohort, the authors indeed reported the isolation methodology in lines 1262-1275. However, there is no information about the sequencing of these isolates. The authors should present these isolates as a list (supplementary table) with their names, taxonomy, metabolic profile, and Genome ID if these genomes were submitted to NCBI.

      We added additional information for how metabolic cohort isolates were chosen and how they were taxonomically identified.  The taxonomy and substrate utilization of isolates are in Figure 5D.  We did not sequence the genomes of metabolic cohort bacteria.  However, the ATCC isolates, which comprise the taxonomic cohort, are publicly available.

      The author presented the 248 metagenomics assembles in Figure S1 in a circular chart in context with other genomes. However, the metagenomic assembles should be presented in a table form, with their name, taxonomy, coverage, completeness, and Genome ID, if these genomes were submitted to NCBI.

      The information for the genomes submitted to the NCBI is provided in the data availability statement.  However, we added a table (Table S9) that includes the requested information.   

      (5) Lines 371-3374: "To delineate hypotheses of metabolic redundancy or cooperation for mitigating the negative effects of oxalate on the gut microbiota and host, two independent diet trials were conducted with analogous microbial communities derived from the metabolic and taxonomic cohorts". 

      Lines 494-496: "we and others have found that oxalate can differentially exhibit positive or negative effects on microbial growth and metabolism dependent on the species and environment present" - What is the evidence that oxalate has a negative effect on the gut microbiota? The authors clearly showed the negative effect of oxalate on the host. Although there are reports in the literature of oxalate consumers with a negative effect on the microbiome, such as Lactobacilli and Bifidobacteria, there is no evidence in this manuscript about a negative effect of oxalate on the microbiome, and there is not an experimental design to evaluate it.

      These data are presented in Figure 2A and B.  As stated, oxalate led to a net reduction in total microbial metabolites produced of 34 metabolites, with a significant shift in overall metabolome, indicative of metabolic inhibition.  This is in comparison to the net gain of 9 metabolites, with no significant shift overall,  in the mice with the NALB microbiota.  The positive and negative effects of oxalate on the whole gut microbiota here are bolstered by previous studies on the effect of oxalate on pure cultures as discussed and cited on line 623624.

      (6) Related to the last section, it is hard to really compare the results of the taxonomic cohort versus the metabolic cohort when the data of one cohort is in the main figure and the other in a supplementary figure. In addition, all the comparisons between the two cohorts seem to be qualitative. For any comparisons, the authors need to do a statistical comparison between the groups of the two cohorts.

      The comparison of the two sets of data are indeed qualitative.  This is because these mouse models were run in separate experiments to test separate hypotheses (whether utilization of specific substrates is enough to improve oxalate metabolism or if specific taxa previously responsive to dietary oxalate was better, which is stated in the manuscript).  Given that these experimental models were tested separately, it would not be statistically valid to do a direct statistical comparison, even though the experimental procedures were the same and the only difference were the transplanted bacteria.  The separation of the experiments into a main and supplemental figure was done out of necessity given the very large amount of data and many experimental mouse models that were run in this study overall.   

      Minor Comments.

      (1) The authors should define 'antinutrients'. This term is not a familiar concept and could create confusion.

      This is defined in line 104 “molecules produced in plants to deter herbivory, disrupt homeostasis by targeting the function of the microbiome, host, or both”

      (2) The authors should explicitly describe the N. albigulia, aka White-throated woodrat system, as early as possible in the result section.

      We added some statements about the Swiss webster and N. albigula gut microbiota as poor and effective oxalate degraders in the second section of the results.

      (3) SW-SW mice exhibited an oxalate-dependent alteration of 219 hepatic genes, with a net increase in activity. In comparison, the SW-NALB mice exhibited an oxalate-dependent alteration of 21 genes with a net decrease in activity. However, the visual representation of the PCoA in Figure 1B showed that the most different samples are the SW-NALB 0% and 1.5%. Could you please explain this difference?

      In Figure 1b, the SW-NALB data are represented by the blue and black data points, which directly overlap with each other.  The SW-SW data are the orange and purple data points, which exhibit very little overlap.  

      (4) Is Table S7 the same as Table S6? If not, there is a missing supplementary table.

      These tables are different.  We ensured that both are present.

      (5) How did the authors test bacterial growth in in vivo studies (Figure 5B)?

      We added a statement to the culturomic section of the methods – we used media with or without oxalate and quantified colony-forming units.

      (6) A section of 16S rRNA metagenomics in the material and method section is not used across the main manuscript.

      These data are presented in figures S7 and S10, as stated in the results.  We added statements in the results to clarify that these figures show the 16S sequencing data.

      (7) Lines 506-511: "Collectively, data from the current and previous studies on the effect of oxalate exposure on the gut microbiota support the hypothesis that the gut microbiota serves as an adaptive organ in which specific, metabolically redundant microbes respond to and eliminate dietary components, for the benefit of themselves, but which can residually protect or harm host health depending on the dietary molecules and gut microbiota composition." - What is the benefit to bacteria in eliminating oxalate? This is highly speculative to this system.

      The benefit to bacteria is stated earlier in that paragraph – “In the current (Figs. 2B, 5B) and previous studies(33,34,64,65), we and others have found that oxalate can differentially exhibit positive or negative effects on microbial growth and metabolism dependent on the species and environment present.”

      (8) Lines 504 -506: "Importantly, the near-universal presence of formate metabolism genes suggest that formate may be an even greater source of ecological pressure (Figures S2-S5)."

      - Formate is primarily produced by fermentative anaerobic bacteria, such as Bacteroides, Clostridia, and certain species of Escherichia coli, since formate would be present in anaerobic communities independently of oxalate. How is formate an even greater source of ecological pressure?

      We added a statement about the toxicity of formate to both bacteria and mammalian hosts.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary

      In this study, the authors build upon previous research that utilized non-invasive EEG and MEG by analyzing intracranial human ECoG data with high spatial resolution. They employed a receptive field mapping task to infer the retinotopic organization of the human visual system. The results present compelling evidence that the spatial distribution of human alpha oscillations is highly specific and functionally relevant, as it provides information about the position of a stimulus within the visual field.

      Using state-of-the-art modeling approaches, the authors not only strengthen the existing evidence for the spatial specificity of the human dominant rhythm but also provide new quantification of its functional utility, specifically in terms of the size of the receptive field relative to the one estimated based on broad band activity.

      We thank the reviewer for their positive summary.

      Weakness 1.1

      The present manuscript currently omits the complementary view that the retinotopic map of the visual system might be related to eye movement control. Previous research in non-human primates using microelectrode stimulation has clearly shown that neuronal circuits in the visual system possess motor properties (e.g. Schiller and Styker 1972, Schiller and Tehovnik 2001). More recent work utilizing Utah arrays, receptive field mapping, and electrical stimulation further supports this perspective, demonstrating that the retinotopic map functions as a motor map. In other words, neurons within a specific area responding to a particular stimulus location also trigger eye movements towards that location when electrically stimulated (e.g. Chen et al. 2020).

      Similarly, recent studies in humans have established a link between the retinotopic variation of human alpha oscillations and eye movements (e.g., Quax et al. 2019, Popov et al. 2021, Celli et al. 2022, Liu et al. 2023, Popov et al. 2023). Therefore, it would be valuable to discuss and acknowledge this complementary perspective on the functional relevance of the presented evidence in the discussion section.

      The reviewer notes that we do not discuss the oculomotor system and alpha oscillations. We agree that the literature relating eye movements and alpha oscillations are relevant.

      At the Reviewer’s suggestion, we added a paragraph on this topic to the first section of the Discussion (section 3.1, “Other studies have proposed … “).

      Reviewer #2 (Public Review):

      Summary:

      In this work, Yuasa et al. aimed to study the spatial resolution of modulations in alpha frequency oscillations (~10Hz) within the human occipital lobe. Specifically, the authors examined the receptive field (RF) tuning properties of alpha oscillations, using retinotopic mapping and invasive electroencephalogram (iEEG) recordings. The authors employ established approaches for population RF mapping, together with a careful approach to isolating and dissociating overlapping, but distinct, activities in the frequency domain. Whereby, the authors dissociate genuine changes in alpha oscillation amplitude from other superimposed changes occurring over a broadband range of the power spectrum. Together, the authors used this approach to test how spatially tuned estimated RFs were when based on alpha range activity, vs. broadband activities (focused on 70-180Hz). Consistent with a large body of work, the authors report clear evidence of spatially precise RFs based on changes in alpha range activity. However, the size of these RFs were far larger than those reliably estimated using broadband range activity at the same recording site. Overall, the work reflects a rigorous approach to a previously examined question, for which improved characterization leads to improved consistency in findings and some advance of prior work.

      We thank the reviewer for the summary.

      Strengths:

      Overall, the authors take a careful and well-motivated approach to data analyses. The authors successfully test a clear question with a rigorous approach and provide strong supportive findings. Firstly, well-established methods are used for modeling population RFs. Secondly, the authors employ contemporary methods for dissociating unique changes in alpha power from superimposed and concomitant broadband frequency range changes. This is an important confound in estimating changes in alpha power not employed in prior studies. The authors show this approach produces more consistent and robust findings than standard band-filtering approaches. As noted below, this approach may also account for more subtle differences when compared to prior work studying similar effects.

      We thank the reviewer for the positive comments.

      Weaknesses:

      Weakness 2.1 Theoretical framing:

      The authors frame their study as testing between two alternative views on the organization, and putative functions, of occipital alpha oscillations: i) alpha oscillation amplitude reflects broad shifts in arousal state, with large spatial coherence and uniformity across cortex; ii) alpha oscillation amplitude reflects more specific perceptual processes and can be modulated at local spatial scales. However, in the introduction this framing seems mostly focused on comparing some of the first observations of alpha with more contemporary observations. Therefore, I read their introduction to more reflect the progress in studying alpha oscillations from Berger's initial observations to the present. I am not aware of a modern alternative in the literature that posits alpha to lack spatially specific modulations. I also note this framing isn't particularly returned to in the discussion.

      This was helpful feedback. We have rewritten nearly the entire Introduction to frame the study differently. The emphasis is now on the fact that several intracranial studies of spatial tuning of alpha (in both human and macaque) tend to show increases in alpha due to visual stimulation, in contrast to a century of MEG/EEG studies, from Berger to the present, showing decreases. We believe that the discrepancy is due to an interaction between measurement type and brain signals. Specifically, intracranial measurements sum decreases in alpha oscillations and increases in broadband power on the same trials, and both signals can be large. In contrast, extracranial measures are less sensitive to the broadband signals and mostly just measure the alpha oscillation. Our study reconciles this discrepancy by removing the baseline broadband power increases, thereby isolating the alpha oscillation, and showing that with iEEG spatial analyses, the alpha oscillation decreases with visual stimulation, consistent with EEG and MEG results.

      Weakness 2.2 A second important variable here is the spatial scale of measurement.

      It follows that EEG based studies will capture changes in alpha activity up to the limits of spatial resolution of the method (i.e. limited in ability to map RFs). This methodological distinction isn't as clearly mentioned in the introduction, but is part of the author's motivation. Finally, as noted below, there are several studies in the literature specifically addressing the authors question, but they are not discussed in the introduction.

      The new Introduction now explicitly contrasts EEG/MEG with intracranial studies and refers to the studies below.

      Weakness 2.3 Prior studies:

      There are important findings in the literature preceding the author's work that are not sufficiently highlighted or cited. In general terms, the spatio-temporal properties of the EEG/iEEG spectrum are well known (i.e. that changes in high frequency activity are more focal than changes in lower frequencies). Therefore, the observations of spatially larger RFs for alpha activities is highly predicted. Specifically, prior work has examined the impact of using different frequency ranges to estimate RF properties, for example ECoG studies in the macaque by Takura et al. NeuroImage (2016) [PubMed: 26363347], as well as prior ECoG work by the author's team of collaborators (Harvey et al., NeuroImage (2013) [PubMed: 23085107]), as well as more recent findings from other groups (Luo et al., (2022) BioRxiv: https://doi.org/10.1101/2022.08.28.505627). Also, a related literature exists for invasively examining RF mapping in the time-voltage domain, which provides some insight into the author's findings (as this signal will be dominated by low-frequency effects). The authors should provide a more modern framing of our current understanding of the spatial organization of the EEG/iEEG spectrum, including prior studies examining these properties within the context of visual cortex and RF mapping. Finally, I do note that the author's approach to these questions do reflect an important test of prior findings, via an improved approach to RF characterization and iEEG frequency isolation, which suggests some important differences with prior work.

      Thank you for these references and suggestions. Some of the references were already included, and the others have been added.

      There is one issue where we disagree with the Reviewer, namely that “the observations of spatially larger RFs for alpha activities is highly predicted”. We agree that alpha oscillations and other low frequency rhythms tend to be less focal than high frequency responses, but there are also low frequency non-rhythmic signals, and these can be spatially focal. We show this by demonstrating that pRFs solved using low frequency responses outside the alpha band (both below and above the alpha frequency) are small, similar to high frequency broadband pRFs, but differing from the large pRFs associated with alpha oscillations. Hence we believe the degree to which signals are focal is more related to the degree of rhythmicity than to the temporal frequency per se. While some of these results were already in the supplement, we now address the issue more directly in the main text in a new section called, “2.5 The difference in pRF size is not due to a difference in temporal frequency.”

      We incorporated additional references into the Introduction, added a new section on low frequency broadband responses to the Results (section 2.5), and expanded the Discussion (section 3.2) to address these new references.

      Weakness 2.4 Statistical testing:

      The authors employ many important controls in their processing of data. However, for many results there is only a qualitative description or summary metric. It appears very little statistical testing was performed to establish reported differences. Related to this point, the iEEG data is highly nested, with multiple electrodes (observations) coming from each subject, how was this nesting addressed to avoid bias?

      We reviewed the primary claims made in the manuscript and for each claim, we specify the supporting analyses and, where appropriate, how we address the issue of nesting. Although some of these analyses were already in the manuscript, many of them are new, including all of the analyses concerning nesting. We believe that putting this information in one place will be useful to the reader, and we now include this text as a new section in supplement, Graphical and statistical support for primary claims.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 2.1:

      Data presentation: In several places, the authors discuss important features of cortical responses as measured with iEEG that need to be carefully considered. This is totally appropriate and a strength of the author's work, however, I feel the reader would benefit from more depiction of the time-domain responses, to help better understand the authors frequency domain approach. For example, Figure 1 would benefit from showing some form of voltage trace (ERP) and spectrogram, not just the power spectra. In addition, part (a) of Figure 1 could convey some basic information about the timing of the experimental paradigm.

      We changed panel A of Figure 1 to include the timing of the experimental paradigm, and we added panels C and D to show the electrode time series before and after regression out of the ERP.

      Recommendation 2.2

      Update introduction to include references to prior EEG/iEEG work on spatial distribution across frequency spectrum, and importantly, prior work mapping RFs with different frequencies.

      We have addressed this issue and re-written our introduction. Please refer to our response in Public Review for further details.

      Recommendation 2.3

      Figure 3 has several panels and should be labeled to make it easier to follow.The dashed line in lower power spectra isn't defined in a legend and is missing from the upper panel - please clarify.

      We updated Figure 3 and reordered the panels to clarify how we computed the summary metrics in broadband and alpha for each stimulus location (i.e., the “ratio” values plotted in panel B). We also simplified the plot of the alpha power spectrum. It now shows a dashed line representing a baseline-corrected response to the mapping stimulus, which is defined in the legend and explained in the caption.

      Recommendation 2.4

      Power spectra are always shown without error shading, but they are mean estimates.

      We added error shading to Figures 1, 2 and 3.

      Recommendation 2.5

      The authors deal with voltage transients in response to visual stimulation, by subtracting out the trail averaged mean (commonly performed). However, the efficacy of this approach depends on signal quality and so some form of depiction for this processing step is needed.

      We added a depiction of the processing steps for regressing out the averaged responses in Figure 1 in an example electrode (panels C and D). We also show in the supplement the effect of regressing out the ERP on all the electrode pRFs. We have added Supplementary Figure 1-2.

      Recommendation 2.6

      I have a similar request for the authors latency correction of their data, where they identified a timing error and re-aligned the data without ground truth. Again, this is appropriate, but some depiction of the success of this correction is very critical for confirming the integrity of the data.

      We now report more detail on the latency correction, and also point out that any small error in the estimate would not affect our conclusions (4.6 ECoG data analysis | Data epoching). The correction was important for a prior paper on temporal dynamics (Groen et al, 2022), which used data from the same participants and estimated the latency of responses. In this paper, our analyses are in the spectral domain (and discard phase), so small temporal shifts are not critical. We now also link to the public code associated with that paper, which implemented the adjustment and quantified the uncertainty in the latency adjustment.

      More details on latency adjustment provided in section 4.6.

      Recommendation 2.7

      In many places the authors report their data shows a 'summary' value, please clarify if this means averaging or summation over a range.

      For both broadband and alpha, we derive one summary value (a scalar) for trial for each stimulus. For broadband, the summary metric is the ratio of power during a given trial and power during blanks, where power in a trial is the geometric mean of the power at each frequency within the defined band). This is equation 3 in the methods, which is now referred to the first time that summary metrics are mentioned in the results.  For alpha, the summary metric is the height of the Gaussian from our model-based approach. This is in equations 1 and 2, and is also now referred to the first time summary metrics are mentioned in the results.

      We added explanation of the summary metrics in the figure captions and results where they are first used, and also referred to the equations in the methods where they are defined.

      Recommendation 2.8

      The authors conclude: "we have discovered that spectral power changes in the alpha range reflect both suppression of alpha oscillations and elevation of broadband power." It might not have been the intention, but 'discovered' seems overstated.

      We agree and changed this sentence.

      Recommendation 2.9

      Supp Fig 9 is a great effort by the authors to convey their findings to the reader, it should be a main figure.

      We are glad you found Supplementary Figure 9 valuable. We moved this figure to the main text.

      Reviewer #3 (Public Review):

      Summary:

      This study tackles the important subject of sensory driven suppression of alpha oscillations using a unique intracranial dataset in human patients. Using a model-based approach to separate changes in alpha oscillations from broadband power changes, the authors try to demonstrate that alpha suppression is spatially tuned, with similar center location as high broadband power changes, but much larger receptive field. They also point to interesting differences between low-order (V1-V3) and higher-order (dorsolateral) visual cortex. While I find some of the methodology convincing, I also find significant parts of the data analysis, statistics and their presentation incomplete. Thus, I find that some of the main claims are not sufficiently supported. If these aspects could be improved upon, this study could potentially serve as an important contribution to the literature with implications for invasive and non-invasive electrophysiological studies in humans.

      We thank the reviewer for the summary.

      Strengths:

      The study utilizes a unique dataset (ECOG & high-density ECOG) to elucidate an important phenomenon of visually driven alpha suppression. The central question is important and the general approach is sound. The manuscript is clearly written and the methods are generally described transparently (and with reference to the corresponding code used to generate them). The model-based approach for separating alpha from broadband power changes is especially convincing and well-motivated. The link to exogenous attention behavioral findings (figure 8) is also very interesting. Overall, the main claims are potentially important, but they need to be further substantiated (see weaknesses).

      We thank the reviewer for the positive comments.

      Weaknesses:

      I have three major concerns:

      Weakness 3.1. Low N / no single subject results/statistics:

      The crucial results of Figure 4,5 hang on 53 electrodes from four patients (Table 2). Almost half of these electrodes (25/53) are from a single subject. Data and statistical analysis seem to just pool all electrodes, as if these were statistically independent, and without taking into account subject-specific variability. The mean effect per each patient was not described in text or presented in figures. Therefore, it is impossible to know if the results could be skewed by a single unrepresentative patient. This is crucial for readers to be able to assess the robustness of the results. N of subjects should also be explicitly specified next to each result.

      We have added substantial changes to deal with subject specific effects, including new results and new figures.

      • Figure 4 now shows variance explained by the alpha pRF broken down by each participant for electrodes in V1 to V3. We also now show a similar figure for dorsolateral electrodes in Supplementary Figure 4-2.

      • Figure 5, which shows results from individual electrodes in V1 to V3, now includes color coding of electrodes by participant to make it clear how the electrodes group with participant. Similarly, for dorsolateral electrodes, we show electrodes grouped by participant in Supplementary Figure 5-1. Same for Supplementary Figure 6-2.

      • Supplementary Figure 7-2 now shows the benefits of our model-based approach for estimating alpha broken down by individual participants.

      • We also now include a new section in the supplement that summarizes for every major claim, what the supporting data are and how we addressed the issue of nesting electrodes by participant, section Graphical and statistical support for primary claims.

      Weakness 3.2. Separation between V1-V3 and dorsolateral electrodes:

      Out of 53 electrodes, 27 were doubly assigned as both V1-V3 and dorsolateral (Table 2, Figures 4,5). That means that out of 35 V1-V3 electrodes, 27 might actually be dorsolateral. This problem is exasperated by the low N. for example all the 20 electrodes in patient 8 assigned as V1-V3 might as well be dorsolateral. This double assignment didn't make sense to me and I wasn't convinced by the authors' reasoning. I think it needlessly inflates the N for comparing the two groups and casts doubts on the robustness of these analyses.

      Electrode assignment was probabilistic to reflect uncertainty in the mapping between location and retinotopic map. The probabilistic assignment is handled in two ways.

      (1) For visualizing results of single electrodes, we simply go with the maximum probability, so no electrode is visualized for both groups of data. For example, Figure 5a (V1-V3) and supplementary Figure 5-1a (dorsolateral electrodes) have no electrodes in common: no electrode is in both plots.

      (2) For quantitative summaries, we sample the electrodes probabilistically (for example Figures 4, 5c). So, if for example, an electrode has a 20% chance of being in V1 to V3, and 30% chance of being in dorsolateral maps, and a 50% chance of being in neither, the data from that electrode is used in only 20% of V1-V3 calculations and 30% of dorsolateral calculations. In 50% of calculations, it is not used at all. This process ensures that an electrode with uncertain assignment makes no more contribution to the results than an electrode with certain assignment. An electrode with a low probability of being in, say, V1-V3, makes little contribution to any reported results about V1-V3. This procedure is essentially a weighted mean, which the reviewer suggests in the recommendations. Thus, we believe there is not a problem of “double counting”.

      The alternative would have been to use maximum probability for all calculations. However, we think that doing so would be misleading, since it would not take into account uncertainty of assignment, and would thus overstate differences in results between the maps.

      We now clarify in the Results that for probabilistic calculations, the contribution of an electrode is limited by the likelihood of assignment (Section 2.3). We also now explain in the methods why we think probabilistic sampling is important.

      Weakness 3.3. Alpha pRFs are larger than broadband pRFs:

      First, as broadband pRF models were on average better fit to the data than alpha pRF models (dark bars in Supp Fig 3. Top row), I wonder if this could entirely explain the larger Alpha pRF (i.e. worse fits lead to larger pRFs). There was no anlaysis to rule out this possibility.

      We addressed this question in a new paragraph in Discussion section 3.1 (“What is the function of the large alpha pRFs?”, paragraph beginning… “Another possible interpretation is that the poorer model fit in the alpha pRF is due to lower signal-to-noise”). This paragraph both refers to prior work on the relationship between noise and pRF size and to our own control analyses (Supplementary Figure 5-2).

      Weakness 3.4 Statistics

      Second, examining closely the entire 2.4 section there wasn't any formal statistical test to back up any of the claims (not a single p-value is mentioned). It is crucial in my opinion to support each of the main claims of the paper with formal statistical testing.

      We agree that it is important for the reader to be able to link specific results and analyses to specific claims. We are not convinced that null hypothesis statistical testing is always the best approach. This is a topic of active debate in the scientific community.

      We added a new section that concisely states each major claim and explicitly annotates the supporting evidence. (Section 4.7). Please also refer to our responses to Reviewer #2 regarding statistical testing (Reviewer weakness 2.4 “Statistical testing”)

      Weakness 3.5 Summary

      While I judge these issues as crucial, I can also appreciate the considerable effort and thoughtfulness that went into this study. I think that addressing these concerns will substantially raise the confidence of the readership in the study's findings, which are potentially important and interesting.

      We again thank the reviewer for the positive comments.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for how to address the three major concerns:

      Suggestion 3.1.

      I am very well aware that it's very hard to have n=30 in a visual cortex ECOG study. That's fine. Best practice would be to have a linear mixed effects model with patients as a random effect. However, for some figures with just 3-4 patients (Figure 4,5) the sample size might be too small even for that. At the very minimum, I would expect to show in figures/describe in text all results per patient (perhaps one can do statistics within each patient, and show for each patient that the effect is significant). Even in primate studies with just two subjects it is expected to show that the results replicate for subject A and B. It is necessary to show that your results don't depend on a single unrepresentative subject. And if they do, at least be transparent about it.

      We have addressed this thoroughly. Please see response to Weakness 3.1 (“Low N / no single subject results/statistics”).

      Suggestion 3.2.

      I just don't get it. I would simply assign an electrode to V1-V3 or dorsolateral cortex based on which area has the highest probability. It doesn't make sense to me that an electrode that has 60% of being in dorsolateral cortex and only 10% to be in V1-V3 would be assigned as both V1-V3 and dorsolateral. Also, what's the rationale to include such electrode in the analysis for let's say V1-V3 (we have weak evidence to believe it's there)? I would either assign electrodes based on the highest probability, or alternatively do a weighted mean based on the probability of each electrode belonging to each region group (e.g. electrode with 40% to be in V1-V3, will get twice the weight as an electrode who has 20% to be in V1-V3) but this is more complicated.

      We have addressed this issue. Please refer to our response in Public Review (“Weakness 3.2 Separation between V1-V3 and dorsolateral”) for details.

      Suggestion 3.3.

      First, to exclude the possibility that alpha pRF are larger simply because they have a worse fit to the neural data, I would show if there is a correlation between the goodnessof-fit and pRF size (for alpha and broadband signals, separately). No [negative] correlation between goodness-of-fit and pRF size would be a good sign. I would also compare alpha & broadband receptive field size when controlling for the goodness-of-fit (selecting electrodes with similar goodness-of-fit for both signals). If the results replicate this way it would be convincing.

      Second, there are no statistical tests in section 2.4, possibly also in others. Even if you employ bootstrap / Monte-Carlo resampling methods you can extract a p-value.

      We have addressed this issue. Please refer to our response in Public Review Point 3.3 (“Alpha pRFs are larger than broadband pRFs”) for further details.

      Suggestion 3.4.

      Also, I don't understand the resampling procedure described in lines 652-660: "17.7 electrodes were assigned to V1-V3, 23.2 to dorsolateral, and 53 to either " - but 17.7 + 23.2 doesn't add up to 53. It also seems as if you assign visual areas differently in this resampling procedure than in the real data - "and randomly assigned each electrode to a visual area according to the Wang full probability distributions". If you assign in your actual data 27 electrodes to both visual areas, the same should be done in the resampling procedure (I would expect exactly 35 V1-V3 and 45 dorsolateral electrodes in every resampling, just the pRFs will be shuffled across electrodes).

      We apologize for the confusion.

      We fixed the sentence above, clarified the caption to Table 2, and also explained the overall strategy of probabilistic resampling better. See response to Public Review point 3.2 for details.

      Suggestion 3.5.

      These are rather technical comments but I believe they are crucial points to address in order to support your claims. I genuinely think your results are potentially interesting and important but these issues need to be first addressed in a revision. I also think your study may carry implications beyond just the visual domain, as alpha suppression is observed for different sensory modalities and cortical regions. Might be useful to discuss this in the discussion section.

      Agree. We added a paragraph on this point to the Discussion (very end of 3.2).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Astrocytes are known to express neuroligins 1-3. Within neurons, these cell adhesion molecules perform important roles in synapse formation and function. Within astrocytes, a significant role for neuroligin 2 in determining excitatory synapse formation and astrocyte morphology was shown in 2017. However, there has been no assessment of what happens to synapses or astrocyte morphology when all three major forms of neuroligins within astrocytes (isoforms 1-3) are deleted using a well characterized, astrocyte specific, and inducible cre line. By using such selective mouse genetic methods, the authors here show that astrocytic neuroligin 1-3 expression in astrocytes is not consequential for synapse function or for astrocyte morphology. They reach these conclusions with careful experiments employing quantitative western blot analyses, imaging and electrophysiology. They also characterize the specificity of the cre line they used. Overall, this is a very clear and strong paper that is supported by rigorous experiments. The discussion considers the findings carefully in relation to past work. This paper is of high importance, because it now raises the fundamental question of exactly what neuroligins 1-3 are actually doing in astrocytes. In addition, it enriches our understanding of the mechanisms by which astrocytes participate in synapse formation and function. The paper is very clear, well written and well illustrated with raw and average data.

      We thank the reviewer for the balanced and informative summary.

      Reviewer #2 (Public Review):

      In the present manuscript, Golf et al. investigate the consequences of astrocyte-specific deletion of Neuroligin family cell adhesion proteins on synapse structure and function in the brain. Decades of prior research had shown that Neuroligins mediate their effects at synapses through their role in the postsynaptic compartment of neurons and their transsynaptic interaction with presynaptic Neurexins. More recently, it was proposed for the first time that Neuroligins expressed by astrocytes can also bind to presynaptic Neurexins to regulate synaptogenesis (Stogsdill et al. 2017, Nature). However, several aspects of the model proposed by Stogsdill et al. on astrocytic Neuroligin function conflict with prior evidence on the role of Neuroligins at synapses, prompting Golf et al. to further investigate astrocytic Neuroligin function in the current study. Using postnatal conditional deletion of Neuroligins 1, 2 and 3 specifically from astrocytes, Golf et al. show that virtually no changes in the expression of synaptic proteins or in the properties of synaptic transmission at either excitatory or inhibitory synapses are observed. Moreover, no alterations in the morphology of astrocytes themselves were found. The authors conclude that while Neuroligins are indeed expressed in astrocytes and are hence likely to play some role there, this role does not include any direct consequences on synaptic structure and function, in direct contrast to the model proposed by Stogsdill et al.

      Overall, this is a strong study that addresses an important and highly relevant question in the field of synaptic neuroscience. Neuroligins are not only key regulators of synaptic function, they have also been linked to numerous psychiatric and neurodevelopmental disorders, highlighting the need to precisely define their mechanisms of action. The authors take a wide range of approaches to convincingly demonstrate that under their experimental conditions, no alterations in the levels of synaptic proteins or in synaptic transmission at excitatory or inhibitory synapses, or in the morphology of astrocytes, are observed.

      We are also grateful for this reviewer’s constructive comments.

      One caveat to this study is that the authors do not directly provide evidence that their Tamoxifen-inducible conditional deletion paradigm does indeed result in efficient deletion of all three Neuroligins from astrocytes. Using a Cre-dependent tdTomato reporter line, they show that tdTomato expression is efficiently induced by the current paradigm, and they refer to a prior study showing efficient deletion of Neuroligins from neurons using the same conditional Nlgn1-3 mouse lines but a different Cre driver strategy. However, neither of these approaches directly provide evidence that all three Neuroligins are indeed deleted from astrocytes in the current study. In contrast, Stogsdill et al. employed FACS and qPCR to directly quantify the loss of Nlgn2 mRNA from astrocytes. This leaves the current Golf et al. study somewhat vulnerable to the criticism, however unlikely, that their lack of synaptic effects may be a consequence of incomplete Neuroligin deletion, rather than a true lack of effect of astrocytic Neuroligins.

      The concern is valid. In the original submission of this paper, we did not establish that the Cre recombinase we used actually deleted neuroligins in astrocytes. We have now addressed this issue in the revised paper with new experiments as described below.

      However, the reviewer’s impression that the Stogsdill et al. paper confirmed full deletion of Nlgn2 is a misunderstanding of the data in that paper. The reviewer is correct that Stogsdill et al. performed FACS to test the efficacy of the GLAST-Cre mediated deletion of Nlgn2-flox mice, followed by qRT-PCR comparing heterozygous with homozygous mutant mice. With their approach, no wild-type control could be used, as these would lack reporter expression. However, this experiment does NOT allow conclusions about the degree of recombination, both overall recombination (i.e. recombination in all astrocytes regardless of TdT+) and recombination in TdT+ astrocytes because it doesn’t quantify recombination. To quantify the degree of recombination, the paper would have had to perform genomic PCR measurements.  

      The problem with the data on the degree of recombination in the Stogsdill et al. (2017) paper, as we understand them, is two-fold.

      First, the GLAST-Cre line only targets ~40-70% of astrocytes, at least as evidenced by highly sensitive Cre-reporter mice in a variety of studies using this Cre line. The 40-70% variation is likely due to differences in the reporter mice and the tamoxifen injection schedule used. In comparison, we are targeting most astrocytes using the Aldh1l1-CreERT2 mice. Moreover, GLAST-Cre mice exhibit neuronal off-targeting, consistent with at least some of the remaining Nlgn2 qRT-PCR signal in the FACS-sorted cells. As we describe next, this signal also likely comes from astrocytes where recombination was incomplete This is the reason why we, like everyone else, are now using the Aldh1l1-Cre line that has been shown to be more efficient both in terms of the overall targeting of astrocytes (i.e. nearly complete) and the level of recombination observed in reporter(+) astrocytes.

      Second, Stogsdill et al. detected a significant decrease in the Nlgn2 qRT-PCR signal in the FACS-sorted homozygous Nlgn2 KO cells compared to the heterozygous Nlgn2 KO cells but the Nlgn2 qRT-PCR signal was still quite large. The data is presented as normalized to the HET condition. As a result, we don’t know the true level of gene deletion (i.e. compared to TdT- astrocytes). For example, based on the Stogsdill et al. data the HET manipulation could have induced only a 20% reduction in Nlgn2 mRNA levels in TdT(+) astrocytes, in which case the KO would have produced a 40% reduction in Nlgn2 mRNA in TdT(+) astrocytes. Moreover, it is possible based on our own experience with the GLAST-Cre line, that the reporter may also not turn on in some astrocytes where other alleles have been independently recombined – just as some astrocytes that are Td(+) would still be wild-type or heterozygous for Nlgn2. Thus, it is impossible to calculate the actual percentage of recombination from these data, even in TdT(+) cells, absent of PCR of genomic DNA from isolated cells. Alternatively, comparison of mRNA levels using primers sensitive to floxed sequences in wild-type controls versus cKO mice would have also yielded a much better idea of the recombination efficiency.

      In summary, it is unclear whether the Nlgn2 deletion in the Stogsdill et al. paper was substantial or marginal – it is simply impossible to tell.

      Reviewer #3 (Public Review):

      This study investigates the roles of astrocytes in the regulation of synapse development and astrocyte morphology using conditional KO mice carrying mutations of three neuroligins1-3 in astrocytes with the deletion starting at two different time points (P1 and P10/11). The authors use morphological, electrophysiological, and cell-biological approaches and find that there are no differences in synapse formation and astrocyte cytoarchitecture in the mutant hippocampus and visual cortex. These results differ from the previous results (Stogsdill et al., 2017), although the authors make several discussion points on how the differences could have been induced. This study provides important information on how astrocytes and neurons interact with each other to coordinate neural development and function. The experiments were well-designed, and the data are of high quality.

      We also thank this reviewer for helpful comments!

      Recommendations for the authors:

      This project was meant to rigorously test the intriguing overall question whether neuroligins, which are abundantly expressed in astrocytes, regulate synapse formation as astrocytic synapse organizers. The goal of the paper was NOT to confirm or dispute the conclusion by Stogsdill et al. (Nature 2017) that Nlgn2 expressed in astrocytes is essential for excitatory synapse formation and that astrocytic Nlgn1-3 are required for proper astrocyte morphogenesis. Instead, the project was meant to address the much broader question whether the abundant expression of any neuroligin, not just Nlgn2, in astrocytes is essential for neuronal excitatory or inhibitory synapse formation and/or for the astrocyte cytoarchitecture. We felt that this was an important question independent of the Stogsdill et al. paper. We analyzed in our experiments young adult mice, a timepoint that was chosen deliberately to avoid the possibility of observing a possible developmental delay rather than a fundamental function that extends beyond development.

      We do recognize that the conclusion by Stogsdill et al. (2017) that Nlgn2 expression in astrocytes is essential for excitatory synapse formation was very exciting to the field but contradicted a large literature demonstrating that Nlgn2 protein is exclusively localized to inhibitory synapses and absent from excitatory synapses (to name just a few papers, see Graf et al., Cell 2004; Varoqueaux et al., Eur. J. Cell Biol. 2004; Patrizi et al., PNAS 2008;  Hoon et al., J. Neurosci. 2009). In addition, the conclusion of Stogsdill et al. that astrocytic Nlgn2 specifically drove excitatory synapse formation was at odds with previous findings documenting that the constitutive deletion of Nlgn2 in all cells, including astrocytes, has no effect on excitatory synapse numbers (again, to name a few papers, see Varoqueaux et al., Neuron 2006; Blundell et al., Genes Brain Behav. 2008; Poulopoulos et al., Neuron 2009; Gibson et al., J. Neurosci. 2009). These contradictions conferred further urgency to our project, but please note that this project was primarily driven by our curiosity about the function of astrocytic neuroligins, not by a fruitless desire to test the validity of one particular Nature paper.

      The general goal of our paper notwithstanding, few papers from our lab have received as much attention and as many negative comments on social media as this paper when it was published as a preprint. Because we take these criticisms seriously, we have over the last year performed extensive additional experiments to ensure that our findings are well founded. We feel that, on balance, our data are incompatible with the notion that astrocytic neuroligins play a fundamental role in excitatory synapse formation but are consistent with other prior findings obtained with neuroligin KO mice. In the new data we added to the paper, we not only characterized the Cre-mediated deletion of neuroligins in depth, but also employed an independent second system -human neurons cultured on mouse glia- to further validate our conclusions as described below. Although we believe that our results are incompatible with the notion that astrocytic neuroligins fundamentally regulate excitatory or inhibitory synapse formation, we also conclude with regret that we still don’t know what astrocytic neuroligins actually do. Thus, the function of astrocytic neuroligins, as there surely must be one, remains a mystery.

      Finally, there are many possible explanations for the discrepancies between our conclusions and those of Stogsdill et al. as described in our paper. Most of these explanations are technical and may explain why not only our, but also the results of many other previous studies from multiple labs, are inconsistent with the conclusions by Stogsdill et al. (2017), as discussed in detail in the revised paper.

      Reviewer #1 (Recommendations For The Authors):

      The paper is very clear and well written. I have only one comment and that is to increase the sizes of Figs 2, 4 and 6 so that the imaging panels can be seen more clearly. Also, although I know the n numbers are provided in the figure legends, the authors may help the reader by providing them in the results when key data and findings are reported.

      We agree and have followed the reviewer’s suggestions as best as we could.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given the strength and importance of the claims that the authors make, I would highly recommend adding some quantitative evidence regarding the efficacy of deletion in astrocytes, e.g. using the same strategy as in Stogsdill et al. As unlikely as it may be that Neuroligin deletion is in fact incomplete, this possibility cannot be excluded unless directly measured. To avoid future discussions on this subject, it seems that the onus is on the authors to provide this information.

      We concur that this is an important point and have devoted a year-long effort to address it. Note, however, that the strategy employed by Stogsdill et al. does not actually allow conclusions about their recombination efficiency. As described above, it only allows the conclusion that some recombination took place. The Stogsdill et al. Nature paper (2017) is a bit confusing on this point. This approach is thus not appropriate to address the question raised by the reviewer.

      We have performed two experiments to address the issue raised by the reviewer.

      First, we used a viral (i.e. AAV2/5) approach to express Rpl22 with a triple HA-tag, also known as Ribotag, which allows us to purify ribosome-bound mRNA from targeted cells for downstream gene expression analysis. The novel construct is driven by the GfaABC1D promoter and includes two additional features which make it particularly useful. First, upstream of Ribotag is a membrane-targeted, Lck-mVenus followed by a self-cleaving P2A sequence. This allows easy visualization of targeted astrocytes. Second, we have incorporated a cassette of four copies of six miRNA targeting sequences (4x6T) for mIR-124 as was recently published (Gleichman et al., 2023) to eliminate off-target expression in neurons. Based on qPCR analysis, the updated construct allowed >95% de-enrichment of neuronal mRNA and slightly improved observed recombination rates (~10% per gene) relative to an earlier version without 4x6T. Mice that were injected with tamoxifen at P1, similar to other experiments in the paper, were then stereotactically injected at ~P35-40 within the dorsal hippocampus with AAV2/5-GfaABC1D-Lck-mVenus-P2A-Rpl22-HA-4x6T. Approximately 3 weeks later, acute slices were prepared, visualized for fluorescence, and both CA1 and nearby cortex that was partially targeted were isolated for downstream ribosome affinity purification with HA antibodies. Total RNA was saved as input. qPCR was performed using assays that are sensitive to the exons that are floxed in the Nlgn123 cKO mice, so that our quantifications are not confounded by potential differences in non-sense mediated decay. Our control data reveals a striking enrichment of an astrocyte marker gene (e.g. aquaporin-4) and de-enrichment of genes for other cell types. In the CA1, we observed robust loss of Nlgn3 (~96%), Nlgn2 (~86%), and Nlgn1 (65%) gene expression. Similarly, in the cortex, we observed a similarly robust loss of Nlgn3 (93%), Nlgn2 (83%), and Nlgn1 (72%) expression. Given that our targeting of astrocytes based on Ai14 Cre-reporter mice was ~90-99%, these reductions are striking and definitive. The existence of some residual transcript reflects the presence of a small population of astrocytes heterozygous for Nlgn2 and Nlgn3. In contrast, Nlgn1 appears more difficult to recombine and it is likely that some astrocytes are either heterozygous or homozygous knockout cells. Although it is thus possible that Nlgn1 could provide some compensation in our experiments, it is worth noting that Stogsdill et al. found that only Nlgn2 and Nlgn3 knockdown with shRNAs resulted in impaired astrocyte morphology by P21. Moreover, they found that Nlgn2 cKO in astrocytes with PALE of a Cre-containing pDNA impaired astrocyte morphology in a gene-dosage dependent manner and suppressed excitatory synapse formation at P21. Thus, our inability to delete all of Nlgn1 doesn’t readily explain contradictions between our findings and theirs.

      Second, in an independent approach we have cultured glia from mouse quadruple conditional Nlgn1234 KO mice and infected the glia with lentiviruses expressing inactive (DCre, control) or active Cre-recombinase. We confirmed complete recombination by PCR. We then cultured human neurons forming excitatory synapses on the glia expressing or lacking neuroligins and measured the frequency and amplitude of mEPSCs as a proxy for synapse numbers and synaptic function. As shown in the new Figure 9, we detected no significant changes in mEPSCs, demonstrating in this independent system that the glial neuroligins do not detectably influence excitatory synapse formation.

      (2) Along the same lines, the authors should be careful not to overstate their findings in this direction. For example, the figure caption for Figure 2 reads 'Nlgn1-3 are efficiently and selectively deleted in astrocytes by crossing triple Nlgn1-3 conditional KO mice with Adh1l1-CreERT2 driver mice and inducing Cre-activity with tamoxifen early during postnatal development'. This is not technically correct and should be modified to reflect that the authors are not in fact assessing deletion of Nlgn1-3, but only expression of a tdTomato reporter.

      We agree – this is essentially the same criticism as comment #1.

      (3) In general, the animal numbers used for the experiments are rather low. With an n = 4 for most experiments, only large abnormalities would be detected anyway, while smaller alterations would not reach statistical significance due to the inherent biological and technical variance. For the most part, this is not a concern, since there really is no difference between WTs and Nlgn1-3 cKOs. However, trends are observed in some cases, and it is conceivable that these would become significant changes with larger n's, e.g. Figure 3H (Vglut2); Figure 4E (VGlut2 S.P., D.G.); Figure 6D (Vglut2). Increasing the numbers to n = 6 here would greatly strengthen the claims that no differences are observed.

      We concur that small differences would not have been detected in our experiments but feel that given the very large phenotypes of the neuroligin deletions in neurons and of the phenotypes reported by Stogsdill et al. (2017), which also did not employ a large number of animals, a very small phenotype in astrocytes would not have been very informative.

      Minor points:

      (1) Please state the exact genetic background for the mouse lines used.

      Our lab generally uses hybrid CD1/Bl6 mice to avoid artifacts produced by inbred genetic mutations in so-called ‘pure’ lines, especially Bl6 mice. This standard protocol was followed in the present study. Thus, the mice are on a mixed CD1/Bl6 hybrid background.

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 4 demonstrates that neuroligin 1-3 deletions restricted to astrocytes do not affect the number of excitatory and inhibitory synapses in layer IV of the primary visual cortex. This conclusion could be further strengthened if the authors could provide electrophysiological evidence such as mE/IPSCs.

      We agree but have chosen a different avenue to further test our conclusions because slice electrophysiological experiments are time-consuming, labor intensive, and difficult to quantitate, especially in cortex.

      Specifically, we have co-cultured human neurons with astrocytes that either contain or lack neuroligins (new Fig. 9). With this experimental design, we have total control over ALL neuroligins in astrocytes. Electrophysiological recordings then demonstrated that the complete deletion of all glial neuroligins has no effect on mEPSC frequencies and amplitudes. Although clearly much more needs to be done, the new results confirm in an independent system that glial neuroligins have no effect on synapse formation in the neurons, even though neurons depend on astrocytes for synaptogenic factors as Ben Barres brilliantly showed a decade ago. However, it is important to note that dissociated glia in culture, while synaptogenic, are reactive and may not faithfully recapitulate all roles of astrocytes in synaptogenesis.

      (2) It would help readers if the images showing the punctate double marker stainings of excitatory/inhibitory synapses are presented in merged colors (i.e., yellow colors for red and green puncta colors).

      We have tried to improve the visualization of the rather voluminous studies we performed and illustrate in the figures as best as we could.

      (3) The resolutions of the images in the figures are not good, although I guess it is because the images are for review processes.

      We apologize and would like to assure the reviewer that we are supplying high-resolution images to the journal.

      (4) Typos in lines 82 and 274.

      We have corrected these errors.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their thoughtful feedback. We have made substantial revisions to the manuscript to address each of their comments, as we detail below. We want to highlight one major change in particular that addresses a concern raised by both reviewers: the role of the drift rate in our models. Motivated by their astute comments, we went back through our models and realized that we had made a particular assumption that deserved more scrutiny. We previously assumed that the process of encoding the observations made correct use of the objective, generative correlation, but then the process of calculating the weight of evidence used a mis-scaled, subjective version of the correlation. These assumptions led us to scale the drift rate in the model by a term that quantified how the standard deviation of the observation distribution was affected by the objective correlation (encoding), but to scale the bound height by the subjective estimate of the correlation (evidence weighing). However, we realized that encoding may also depend on the subjective correlation experienced by the participant. We have now tested several alternative models and found that the best-fitting model assumes that a single, subjective estimate of the correlation governs both encoding and evidence weighing. An important consequence of updating our models in this way is that we can now account for the behavioral data without needing the additional correlation-dependent drift terms (which, as reviewer #2 pointed out, were difficult to explain).

      We also note that we changed the title slightly, replacing “weighting” with “weighing” for consistency with our usage throughout the manuscript.

      Please see below for more details about this important point and our responses to the reviewers’ specific concerns. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The behavioral strategies underlying decisions based on perceptual evidence are often studied in the lab with stimuli whose elements provide independent pieces of decision-related evidence that can thus be equally weighted to form a decision. In more natural scenarios, in contrast, the information provided by these pieces is often correlated, which impacts how they should be weighted. Tardiff, Kang & Gold set out to study decisions based on correlated evidence and compare the observed behavior of human decision-makers to normative decision strategies. To do so, they presented participants with visual sequences of pairs of localized cues whose location was either uncorrelated, or positively or negatively correlated, and whose mean location across a sequence determined the correct choice. Importantly, they adjusted this mean location such that, when correctly weighted, each pair of cues was equally informative, irrespective of how correlated it was. Thus, if participants follow the normative decision strategy, their choices and reaction times should not be impacted by these correlations. While Tardiff and colleagues found no impact of correlations on choices, they did find them to impact reaction times, suggesting that participants deviated from the normative decision strategy. To assess the degree of this deviation, Tardiff et al. adjusted drift-diffusion models (DDMs) for decision-making to process correlated decision evidence. Fitting these models to the behavior of individual participants revealed that participants considered correlations when weighing evidence, but did so with a slight underestimation of the magnitude of this correlation. This finding made Tardiff et al. conclude that participants followed a close-to-normative decision strategy that adequately took into account correlated evidence.

      Strengths:

      The authors adjust a previously used experimental design to include correlated evidence in a simple, yet powerful way. The way it does so is easy to understand and intuitive, such that participants don't need extensive training to perform the task. Limited training makes it more likely that the observed behavior is natural and reflective of everyday decision-making. Furthermore, the design allowed the authors to make the amount of decision-related evidence equal across different correlation magnitudes, which makes it easy to assess whether participants correctly take account of these correlations when weighing evidence: if they do, their behavior should not be impacted by the correlation magnitude.

      The relative simplicity with which correlated evidence is introduced also allowed the authors to fall back to the well-established DDM for perceptual decisions, which has few parameters, is known to implement the normative decision strategy in certain circumstances, and enjoys a great deal of empirical support. The authors show how correlations ought to impact these parameters, and which changes in parameters one would expect to see if participants misestimate these correlations or ignore them altogether (i.e., estimate correlations to be zero). This allowed them to assess the degree to which participants took into account correlations on the full continuum from perfect evidence weighting to complete ignorance. With this, they could show that participants in fact performed rational evidence weighting if one assumed that they slightly underestimated the correlation magnitude.

      Weaknesses:

      The experiment varies the correlation magnitude across trials such that participants need to estimate this magnitude within individual trials. This has several consequences:

      (1) Given that correlation magnitudes are estimated from limited data, the (subjective) estimates might be biased towards their average. This implies that, while the amount of evidence provided by each 'sample' is objectively independent of the correlation magnitude, it might subjectively depend on the correlation magnitude. As a result, the normative strategy might differ across correlation magnitudes, unlike what is suggested in the paper. In fact, it might be the case that the observed correlation magnitude underestimates corresponds to the normative strategy.

      We thank the reviewer for raising this interesting point, which we now address directly with new analyses including model fits (pp. 15–24). These analyses show that the participants were computing correlation-dependent weights of evidence from observation distributions that reflected suboptimal misestimates of correlation magnitudes. This strategy is normative in the sense that it is the best that they can do, given the encoding suboptimality. However, as we note in the manuscript, we do not know the source of the encoding suboptimality (pp. 23–24). We thus do not know if there might be a strategy they could have used to make the encoding more optimal.

      (2) The authors link the normative decision strategy to putting a bound on the log-likelihood ratio (logLR), as implemented by the two decision boundaries in DDMs. However, as the authors also highlight in their discussion, the 'particle location' in DDMs ceases to correspond to the logLR as soon as the strength of evidence varies across trials and isn't known by the decision maker before the start of each trial. In fact, in the used experiment, the strength of evidence is modulated in two ways:

      (i) by the (uncorrected) distance of the cue location mean from the decision boundary (what the authors call the evidence strength) and

      (ii) by the correlation magnitude. Both vary pseudo-randomly across trials, and are unknown to the decision-maker at the start of each trial. As previous work has shown (e.g. Kiani & Shadlen (2009), Drugowitsch et al. (2012)), the normative strategy then requires averaging over different evidence strength magnitudes while forming one's belief. This averaging causes the 'particle location' to deviate from the logLR. This deviation makes it unclear if the DDM used in the paper indeed implements the normative strategy, or is even a good approximation to it.

      We appreciate this subtle, but important, point. We now clarify that the DDM we use includes degrees of freedom that are consistent with normative decision processes that rely on the imperfect knowledge that participants have about the generative process on each trial, specifically: 1) a single drift-rate parameter that is fit to data across different values of the mean of the generative distribution, which is based on the standard assumption for these kinds of task conditions in which stimulus strength is varied randomly from trial-to-trial and thus prevents the use of exact logLR (which would require stimulus strength-specific scale factors; Gold and Shadlen, 2001); 2) the use of a collapsing bound, which in certain cases (including our task) is thought to support a stimulus strength-dependent calibration of the decision variable to optimize decisions (Drugowitsch et al, 2012); and 3) free parameters (one per correlation) to account for subjective estimates of the correlation, which affected the encoding of the observations that are otherwise weighed in a normative manner in the best-fitting model.

      Also, to clarify our terminology, we define the objective evidence strength as the expected logLR in a given condition, which for our task is dependent on both the distance of the mean from the decision boundary and the correlation (p. 7). 

      Given that participants observe 5 evidence samples per second and on average require multiple seconds to form their decisions, it might be that they are able to form a fairly precise estimate of the correlation magnitude within individual trials. However, whether this is indeed the case is not clear from the paper.

      These points are now addressed directly in Results (pp. 23–24) and Figure 7 supplemental figures 1–3. Specifically, we show that, as the reviewer correctly surmised above, empirical correlations computed on each trial tended to be biased towards zero (Fig 7–figure supplement 1). However, two other analyses were not consistent with the idea that participants’ decisions were based on trial-by-trial estimates of the empirical correlations: 1) those with the shortest RTs did not have the most-biased estimates (Fig 7–figure supplement 2), and 2) there was no systematic relationship between objective and subjective fit correlations across participants (Fig 7–figure supplement 3).

      Furthermore, the authors capture any underestimation of the correlation magnitude by an adjustment to the DDM bound parameter. They justify this adjustment by asking how this bound parameter needs to be set to achieve correlation-independent psychometric curves (as observed in their experiments) even if participants use a 'wrong' correlation magnitude to process the provided evidence. Curiously, however, the drift rate, which is the second critical DDM parameter, is not adjusted in the same way. If participants use the 'wrong' correlation magnitude, then wouldn't this lead to a mis-weighting of the evidence that would also impact the drift rate? The current model does not account for this, such that the provided estimates of the mis-estimated correlation magnitudes might be biased.

      We appreciate this valuable comment, and we agree that we previously neglected the potential impact of correlation misestimates on evidence strength. As we now clarify, the correlation enters these models in two ways: 1) via its effect on how the observations are encoded, which involves scaling both the drift and the bound; and 2) via its effect on evidence weighing, which involves scaling only the bound (pp. 15–18). We previously assumed that only the second form of scaling might involve a subjective (mis-)estimate of the correlation. We now examine several models that also include the possibility of either or both forms using subjective correlation estimates. We show that a model that assumes that the same subjective estimate drives both encoding and weighing (the “full-rho-hat” model) best accounts for the data. This model provides better fits (after accounting for differences in numbers of parameters) than models with: 1) no correlation-dependent adjustments (“base” model), 2) separate drift parameters for each correlation condition (“drift” model), 3) optimal (correlation-dependent) encoding but suboptimal weighing (“bound-rho-hat” model, which was our previous formulation), 4) suboptimal encoding and weighing (“scaled-rho-hat” model), and 5) optimal encoding but suboptimal weighing and separate correlation-dependent adjustments to the drift rate (“boundrho-hat plus drift” model). We have substantially revised Figures 5–7 and the associated text to address these points.

      Lastly, the paper makes it hard to assess how much better the participants' choices would be if they used the correct correlation magnitudes rather than underestimates thereof. This is important to know, as it only makes sense to strictly follow the normative strategy if it comes with a significant performance gain.

      We now include new analyses in Fig. 7 that demonstrate how much participants' choices and RT deviate from: 1) an ideal observer using the objective correlations, and 2) an observer who failed to adjust for the fit subjective correlation when weighing the evidence (i.e., using the subjective correlation for encoding but a correlation of zero for weighing). We now indicate that participants’ performance was quite close to that predicted by the ideal observer (using the true, objective correlation) for many conditions. Thus, we agree that they might not have had the impetus to optimize the decision process further, assuming it were possible under these task conditions.

      Reviewer #2 (Public review):

      Summary:

      This study by Tardiff, Kang & Gold seeks to: i) develop a normative account of how observers should adapt their decision-making across environments with different levels of correlation between successive pairs of observations, and ii) assess whether human decisions in such environments are consistent with this normative model.

      The authors first demonstrate that, in the range of environments under consideration here, an observer with full knowledge of the generative statistics should take both the magnitude and sign of the underlying correlation into account when assigning weight in their decisions to new observations: stronger negative correlations should translate into stronger weighting (due to the greater information furnished by an anticorrelated generative source), while stronger positive correlations should translate into weaker weighting (due to the greater redundancy of information provided by a positively correlated generative source). The authors then report an empirical study in which human participants performed a perceptual decision-making task requiring accumulation of information provided by pairs of perceptual samples, under different levels of pairwise correlation. They describe a nuanced pattern of results with effects of correlation being largely restricted to response times and not choice accuracy, which could partly be captured through fits of their normative model (in this implementation, an extension of the well-known drift-diffusion model) to the participants' behaviour while allowing for misestimation of the underlying correlations.

      Strengths:

      As the authors point out in their very well-written paper, appropriate weighting of information gathered in correlated environments has important consequences for real-world decisionmaking. Yet, while this function has been well studied for 'high-level' (e.g. economic) decisions, how we account for correlations when making simple perceptual decisions on well-controlled behavioural tasks has not been investigated. As such, this study addresses an important and timely question that will be of broad interest to psychologists and neuroscientists. The computational approach to arrive at normative principles for evidence weighting across environments with different levels of correlation is very elegant, makes strong connections with prior work in different decision-making contexts, and should serve as a valuable reference point for future studies in this domain. The empirical study is well designed and executed, and the modelling approach applied to these data showcases a deep understanding of relationships between different parameters of the drift-diffusion model and its application to this setting. Another strength of the study is that it is preregistered.

      Weaknesses:

      In my view, the major weaknesses of the study center on the narrow focus and subsequent interpretation of the modelling applied to the empirical data. I elaborate on each below:

      Modelling interpretation: the authors' preference for fitting and interpreting the observed behavioural effects primarily in terms of raising or lowering the decision bound is not well motivated and will potentially be confusing for readers, for several reasons. First, the entire study is conceived, in the Introduction and first part of the Results at least, as an investigation of appropriate adjustments of evidence weighting in the face of varying correlations. The authors do describe how changes in the scaling of the evidence in the drift-diffusion model are mathematically equivalent to changes in the decision bound - but this comes amidst a lengthy treatment of the interaction between different parameters of the model and aspects of the current task which I must admit to finding challenging to follow, and the motivation behind shifting the focus to bound adjustments remained quite opaque. 

      We appreciate this valuable feedback. We have revised the text in several places to make these important points more clearly. For example, in the Introduction we now clarify that “The weight of evidence is computed as a scaled version of each observation (the scaling can be applied to the observations or to the bound, which are mathematically equivalent; Green and Swets, 1966) to form the logLR” (p. 3). We also provide more details and intuition in the Results section for how and why we implemented the DDM the way we did. In particular, we now emphasize that the correlation enters these models in two ways: 1) via its effect on encoding the observations, which scales both the drift and the bound; and 2) via its effect on evidence weighing, which scales only the bound (pp. 15–18).

      Second, and more seriously, bound adjustments of the form modelled here do not seem to be a viable candidate for producing behavioural effects of varying correlations on this task. As the authors state toward the end of the Introduction, the decision bound is typically conceived of as being "predefined" - that is, set before a trial begins, at a level that should strike an appropriate balance between producing fast and accurate decisions. There is an abundance of evidence now that bounds can change over the course of a trial - but typically these changes are considered to be consistently applied in response to learned, predictable constraints imposed by a particular task (e.g. response deadlines, varying evidence strengths). In the present case, however, the critical consideration is that the correlation conditions were randomly interleaved across trials and were not signaled to participants in advance of each trial - and as such, what correlation the participant would encounter on an upcoming trial could not be predicted. It is unclear, then, how participants are meant to have implemented the bound adjustments prescribed by the model fits. At best, participants needed to form estimates of the correlation strength/direction (only possible by observing several pairs of samples in sequence) as each trial unfolded, and they might have dynamically adjusted their bounds (e.g. collapsing at a different rate across correlation conditions) in the process. But this is very different from the modelling approach that was taken. In general, then, I view the emphasis on bound adjustment as the candidate mechanism for producing the observed behavioural effects to be unjustified (see also next point).

      We again appreciate this valuable feedback and have made a number of revisions to try to clarify these points. In addition to addressing the equivalence of scaling the evidence and the bound in the Introduction, we have added the following section to Results (Results, p.18):

      “Note that scaling the bound in these formulations follows conventions of the DDM, as detailed above, to facilitate interpretation of the parameters. These formulations also raise an apparent contradiction: the “predefined” bound is scaled by subjective estimates of the correlation, but the correlation was randomized from trial to trial and thus could not be known in advance. However, scaling the bound in these ways is mathematically equivalent to using a fixed bound on each trial and scaling the observations to approximate logLR (see Methods). This equivalence implies that in the brain, effectively scaling a “predefined” bound could occur when assigning a weight of evidence to the observations as they are presented.”

      We also note in Methods (pp. 40–41):

      “In the DDM, this scaling of the evidence is equivalent to assuming that the decision variable accumulates momentary evidence of the form (x1 + x2) and then dividing the bound height by the appropriate scale factor. An alternative approach would be to scale both the signal and noise components of the DDM by the scale factor. However, scaling the bound is both simpler and maintains the conventional interpretation of the DDM parameters in which the bound reflects the decision-related components of the evidence accumulation process, and the drift rate represents sensory-related components.”

      We believe we provide strong evidence that participants adjust their evidence weighing to account for the correlations (see response below), but we remain agnostic as to how exactly this weighing is implemented in the brain.

      Modelling focus: Related to the previous point, it is stated that participants' choice and RT patterns across correlation conditions were qualitatively consistent with bound adjustments (p.20), but evidence for this claim is limited. Bound adjustments imply effects on both accuracy and RTs, but the data here show either only effects on RTs, or RT effects mixed with accuracy trends that are in the opposite direction to what would be expected from bound adjustment (i.e. slower RT with a trend toward diminished accuracy in the strong negative correlation condition; Figure 3b). Allowing both drift rate and bound to vary with correlation conditions allowed the model to provide a better account of the data in the strong correlation conditions - but from what I can tell this is not consistent with the authors' preregistered hypotheses, and they rely on a posthoc explanation that is necessarily speculative and cannot presently be tested (that the diminished drift rates for higher negative correlations are due to imperfect mapping between subjective evidence strength and the experimenter-controlled adjustment to objective evidence strengths to account for effects of correlations). In my opinion, there are other candidate explanations for the observed effects that could be tested but lie outside of the relatively narrow focus of the current modelling efforts. Both explanations arise from aspects of the task, which are not mutually exclusive. The first is that an interesting aspect of this task, which contrasts with most common 'univariate' perceptual decision-making tasks, is that participants need to integrate two pieces of information at a time, which may or may not require an additional computational step (e.g. averaging of two spatial locations before adding a single quantum of evidence to the building decision variable). There is abundant evidence that such intermediate computations on the evidence can give rise to certain forms of bias in the way that evidence is accumulated (e.g. 'selective integration' as outlined in Usher et al., 2019, Current Directions in Psychological Science; Luyckx et al., 2020, Cerebral Cortex) which may affect RTs and/or accuracy on the current task. The second candidate explanation is that participants in the current study were only given 200 ms to process and accumulate each pair of evidence samples, which may create a processing bottleneck causing certain pairs or individual samples to be missed (and which, assuming fixed decision bounds, would presumably selectively affect RT and not accuracy). If I were to speculate, I would say that both factors could be exacerbated in the negative correlation conditions, where pairs of samples will on average be more 'conflicting' (i.e. further apart) and, speculatively, more challenging to process in the limited time available here to participants. Such possibilities could be tested through, for example, an interrogation paradigm version of the current task which would allow the impact of individual pairs of evidence samples to be more straightforwardly assessed; and by assessing the impact of varying inter-sample intervals on the behavioural effects reported presently.

      We thank the reviewer for this thoughtful and valuable feedback. We have thoroughly updated the modeling section to include new analysis and clearer descriptions and interpretations of our findings (including Figs. 5–7 and additional references to the Usher, Luyckx, and other studies that identified decision suboptimalities). The comment about “an additional computational step” in converting the observations to evidence was particularly useful, in that it made us realize that we were making what we now consider to be a faulty assumption in our version of the DDM. Specifically, we assumed that subjective misestimates of the correlation affected how observations were converted to evidence (logLR) to form the decision (implemented as a scaling of the bound height), but we neglected to consider how suboptimalities in encoding the observations could also lead to misestimates of the correlation. We have retained the previous best-fitting models in the text, for comparison (the “bound-rho-hat” and “bound-rho-hat + drift” models). In addition, we now include a “full-rho-hat” model that assumes that misestimates of rho affect both the encoding of the observations, which affects the drift rate and bound height, and the weighing of the evidence, which affects only the bound height. This was the best-fitting model for most participants (after accounting for different numbers of parameters associated with the different models we tested). Note that the full-rho-hat model predicts the lack of correlation-dependent choice effects and the substantial correlation-dependent RT effects that we observed, without requiring any additional adjustments to the drift rate (as we resorted to previously).

      In summary, we believe that we now have a much more parsimonious account of our data, in terms of a model in which subjective estimates of the correlation are alone able to account for our patterns of choice and RT data. We fully agree that more work is needed to better understand the source of these misestimates but also think those questions are outside the scope of the present study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A few minor comments:

      (1) Evidence can be correlated in multiple ways. It could be correlated within individual pieces of evidence in a sequence, or across elements in that sequence (e.g., across time). This distinction is important, as it determines how evidence ought to be accumulated across time. In particular, if evidence is correlated across time, simply summing it up might be the wrong thing to do. Thus, it would be beneficial to make this distinction in the Introduction, and to mention that this paper is only concerned with the first type of correlation.

      We now clarify this point in the Introduction (p. 5–6).

      (2) It is unclear without reading the Methods how the blue dashed line in Figure 4c is generated. To my understanding, it is a prediction of the naive DDM model. Is this correct?

      We now specify the models used to make the predictions shown in Fig. 4c (which now includes an additional model that uses unscaled observations as evidence).

      (3) In Methods, given the importance of the distribution of x1 + x2, it would be useful to write it out explicitly, e.g., x1 + x2 ~ N(2 mu_g, ..), specifying its mean and its variance.

      Excellent suggestion, added to p. 38.

      (4) From Methods and the caption of Figure 6 - Supplement 1 it becomes clear that the fitted DDM features a bound that collapses over time. I think that this should also be mentioned in the main text, as it is a not-too-unimportant feature of the model.

      Excellent suggestion, added to p. 15, with reference to Fig. 6-supplement 1 on p. 20.

      (5) The functional form of the bound is 2 (B - tb t). To my understanding, the effective B changes as a function of the correlation magnitude. Does tb as well? If not, wouldn't it be better if it does, to ensure that 2 (B - tb t) = 0 independent of the correlation magnitude?

      In our initial modeling, we also considered whether the correlation-dependent adjustment, which is a function of both correlation sign and magnitude, should be applied to the initial bound or to the instantaneous bound (i.e., after collapse, affecting tb as well). In a pilot analysis of data from 22 participants in the 0.6 correlation-magnitude group, we found that this choice had a negligible effect on the goodness-of-fit (deltaAIC = -0.9, protected exceedance probability = 0.63, in favor of the instantaneous bound scaling). We therefore used the instantaneous bound version in the analyses reported in the manuscript but doubt this choice was critical based on these results. We have clarified our implementation of the bound in Methods (p. 43–44).

      Reviewer #2 (Recommendations for the authors):

      In addition to the points raised above, I have some minor suggestions/open questions that arose from my reading of the manuscript:

      (1) Are the predictions outlined in the paper specific to cases where the two sources are symmetric around zero? If distributions are allowed to be asymmetric then one can imagine cases (i.e. when distribution means are sufficiently offset from one another) where positive correlations can increase evidence strength and negative correlations decrease evidence strength. There's absolutely still value and much elegance in what the authors are showing with this work, but if my intuition is correct, it should ideally be acknowledged that the predictions are restricted to a specific set of generative circumstances.

      We agree that there are a lot of ways to manipulate correlations and their effect on the weight of evidence. At the end of the Discussion, we emphasize that our results apply to this particular form of correlation (p. 32).

      (2) Isn't Figure 4C misleading in the sense that it collapses across the asymmetry in the effect of negative vs positive correlations on RT, which is clearly there in the data and which simply adjusting the correlation-dependent scale factor will not reproduce?

      We agree that this analysis does not address any asymmetries in suboptimal estimates of positive versus negative correlations. We believe that those effects are much better addressed using the model fitting, which we present later in the Results section. We have now simplified the analyses in Fig. 4c, reporting the difference in RT between positive and negative correlation conditions instead of a linear regression.

      (3) I found the transition on p.17 of the Results section from the scaling of drift rate by correlation to scaling of bound height to be quite abrupt and unclear. I suspect that many readers coming from a typical DDM modelling background will be operating under the assumption that drift rate and bound height are independent, and I think more could be done here to explain why scaling one parameter by correlation in the present case is in fact directly equivalent to scaling the other.

      Thank you for the very useful feedback, we have substantially revised this text to make these points more clearly.

      (4) P.3, typo: Alan *Turing*

      That’s embarrassing. Fixed.

      (5) P.27, typo: "participants adopt a *fixed* bound"

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study presents valuable findings related to seasonal brain size plasticity in the Eurasian common shrew (Sorex araneus), which is an excellent model system for these studies. The evidence supporting the authors' claims is convincing. However, the authors should be careful when applying the term adaptive to the gene expression changes they observe; it would be challenging to demonstrate the differential fitness effects of these gene expression changes. The work will be of interest to biologists working on neuroscience, plasticity, and evolution.

      We appreciate the reviewers’ suggestions and comments. For the phylogenetic ANOVA we used (EVE), which tests for a separate RNA expression optimum specific to the shrew lineage consistent with expectations for adaptive evolution of gene expression. But, as you noted, while this analysis highlights many candidate genes evolving in a manner consistent with positive selection, further functional validation is required to confirm if and how these genes contribute to Dehnel’s phenomenon. In the discussion, we now emphasize that inferred adaptive expression of these genes is putative and outline that future studies are needed to test the function of proposed adaptations. For example, cell line validations of BCL2L1 on apoptosis is a case study that tests the function of a putatively adaptive change in gene expression, and it illuminates this limitation. We also have refined our discussion to focus more on pathway-level analyses rather than on individual genes, and have addressed other issues presented, including clarity of methods and using sex as a covariate in our analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, Thomas et al. set out to study seasonal brain gene expression changes in the Eurasian common shrew. This mammalian species is unusual in that it does not hibernate or migrate but instead stays active all winter while shrinking and then regrowing its brain and other organs. The authors previously examined gene expression changes in two brain regions and the liver. Here, they added data from the hypothalamus, a brain region involved in the regulation of metabolism and homeostasis. The specific goals were to identify genes and gene groups that change expression with the seasons and to identify genes with unusual expression compared to other mammalian species. The reason for this second goal is that genes that change with the season could be due to plastic gene regulation, where the organism simply reacts to environmental change using processes available to all mammals. Such changes are not necessarily indicative of adaptation in the shrew. However, if the same genes are also expression outliers compared to other species that do not show this overwintering strategy, it is more likely that they reflect adaptive changes that contribute to the shrew's unique traits.

      The authors succeeded in implementing their experimental design and identified significant genes in each of their specific goals. There was an overlap between these gene lists. The authors provide extensive discussion of the genes they found.

      The scope of this paper is quite narrow, as it adds gene expression data for only one additional tissue compared to the authors' previous work in a 2023 preprint. The two papers even use the same animals, which had been collected for that earlier work. As a consequence, the current paper is limited in the results it can present. This is somewhat compensated by an expansive interpretation of the results in the discussion section, but I felt that much of this was too speculative. More importantly, there are several limitations to the design, making it hard to draw stronger conclusions from the data. The main contribution of this work lies in the generated data and the formulation of hypotheses to be tested by future work.

      Thank you for your interest in our manuscript and for your insights. We addressed your comments below: we now highlight the limitations of our study design in the discussion and emphasize that, while a second optimum of gene expression in shrews is consistent with adaptive evolution, we recognize that not all sources of variation in gene expression can be fully accounted for. We highlight the putative nature of these results in our revisions, especially in our new limitations section (lines 541-555).

      Strengths:

      The unique biological model system under study is fascinating. The data were collected in a technically sound manner, and the analyses were done well. The paper is overall very clear, well-written, and easy to follow. It does a thorough job of exploring patterns and enrichments in the various gene sets that are identified.

      I specifically applaud the authors for doing a functional follow-up experiment on one of the differentially expressed genes (BCL2L1), even if the results did not support the hypothesis. It is important to report experiments like this and it is terrific to see it done here.

      We are glad to hear that you found our manuscript fascinating and clearly written. While we hoped to see an effect of BCL2L1 on apoptosis as proposed, we agree that reporting null results is valuable when validating evolutionary inferences.

      Weaknesses:

      While the paper successfully identifies differentially expressed seasonal genes, the real question is (as explained by the authors) whether these are evolved adaptations in the shrews or whether they reflect plastic changes that also exist in other species. This question was the motivation for the inter-species analyses in the paper, but in my view, these cannot rigorously address this question. Presumably, the data from the other species were not collected in comparable environments as those experienced by the shrews studied here. Instead, they likely (it is not specified, and might not be knowable for the public data) reflect baseline gene expression. To see why this is problematic, consider this analogy: if we were to compare gene expression in the immune system of an individual undergoing an acute infection to other, uninfected individuals, we would see many, strong expression differences. However, it would not be appropriate to claim that the infected individual has unique features - the relevant physiological changes are simply not triggered in the other individuals. The same applies here: it is hard to draw conclusions from seasonal expression data in the shrews to non-seasonal data in the other species, as shrew outlier genes might still reflect physiological changes that weren't active in the other species.

      There is no solution for this design flaw given the public data available to the authors except for creating matched data in the other species, which is of course not feasible. The authors should acknowledge and discuss this shortcoming in the paper.

      Thank you for taking the time to provide such insightful feedback. As you noted, whiles shrews experience seasonal size changes, their environments may differ from the other species used in this experiment, leading to increased or decreased expression of certain genes and reducing our ability accurately detect selection across the phylogeny. Although we sought to control for as many sources of variation as possible, such as using only post-pubescent, wild, or non-domesticated individuals when feasible, we recognize that not all sources of variation can be fully accounted for within a practical experiment. We agree that these sources of variation can introduce both false positives and negatives into our results, and we have now highlighted this limitation within our discussion (lines 538-552).

      Related to the point above: in the section "Evolutionary Divergence in Expression" it is not clear which of the shrew samples were used. Was it all of them, or only those from winter, fall, etc? One might expect different results depending on this. E.g., there could be fewer genes with inferred adaptive change when using only summer samples. The authors should specify which samples were included in these analyses, and, if all samples were used, conduct a robustness analysis to see which of their detected genes survive the exclusion of certain time points.

      Thank you for this attention to detail. We used spring adults for this analysis. This decision was made as only used post pubescent individuals for all species in the analysis, and this was the only season where adult shrews were going through Dehnel’s phenomenon. We have now clarified this in both the methods and results (line 247 and line 667)

      In the same section, were there also genes with lower shrew expression? None are mentioned in the text, so did the authors not test for this direction, or did they test and there were no significant hits?

      We did test for decreased shrew expression compared to the rest of the species, but there were no significant genes with significant decreases. We hypothesize that there are two potential reasons for this results; 1) If a gene were to be selected for decreased expression, selection for constitutive expression of the gene across all species may be weak, and thus found in other lineages as well, or 2) decreased or no expression may relax selection on the coding regions, and thus these genes are not pulled out as we identify 1:1 orthologs. This is consistent with results provided from the original methods manuscript. Thank you for pointing out that we did not discuss this information in the text, and we now include it in our results (lines 250-251).

      The Discussion is too long and detailed, given that it can ultimately only speculate about what the various expression changes might mean. Many of the specific points made (e.g. about the blood-brain-barrier being more permissive to sensing metabolic state, about cross-organ communication, the paragraphs on single, specific genes) are a stretch based on the available data. Illustrating this point, the one follow-up experiment the authors did (on BCL2L1) did not give the expected result. I really applaud the authors for having done this experiment, which goes beyond typical studies in this space. At the same time, its result highlights the dangers of reading too much into differential expression analyses.

      We agree with your point, while our extensive discussion is useful for testing future hypotheses, ultimately some of the discussion may be too speculative for our readers. To amend this, we have reduced some portions of our discussion and focused more on pathways than individual genes, including removing mechanisms related to HRH2, FAM57B, GPR3, and GABAergic neurons. We hope that this highlights to the reader the speculative nature of many of our results.

      There is no test of whether the five genes observed in both analyses (seasonal change and inter-species) exceed the number expected by chance. When two gene sets are drawn at random, some overlap is expected randomly. The expected overlap can be computed by repeated draws of pairs of random sets of the same size as seen in real data and by noting the overlap between the random pairs. If this random distribution often includes sets of five genes, this weakens the conclusions that can be drawn from the genes observed in the real data.

      Thank you for highlighting this approach, it is greatly needed. After running this test, we found that observed overlapping genes were more than the expected overlap, yet not significant. We now show this in our methods (lines 277-278) and results (lines 719-720).

      Reviewer #2 (Public review):

      Summary:

      Shrews go through winter by shrinking their brain and most organs, then regrow them in the spring. The gene expression changes underlying this unusual brain size plasticity were unknown. Here, the authors looked for potential adaptations underlying this trait by looking at differential expression in the hypothalamus. They found enrichments for DE in genes related to the blood-brain barrier and calcium signaling, as well as used comparative data to look at gene expression differences that are unique in shrews. This study leverages a fascinating organismal trait to understand plasticity and what might be driving it at the level of gene expression. This manuscript also lays the groundwork for further developing this interesting system.

      We are glad you found our manuscript interesting and thank and thank you for your feedback. We hope that we have addressed all of your concerns as described below.

      Strengths:

      One strength is that the authors used OU models to look for adaptation in gene expression. The authors also added cell culture work to bolster their findings.

      Weaknesses:

      I think that there should be a bit more of an introduction to Dehnel's phenomenon, given how much it is used throughout.

      Thank you for this insight. With a lengthy introduction and discussion, we agree that the importance of Dehnel’s phenomenon may have been overshadowed. We have shortened both sections and emphasized the background on Dehnel’s phenomenon in the first two paragraphs of the introduction, allowing this extraordinary seasonal size plasticity to stand out.

      Reviewer #3 (Public review):

      Summary:

      In their study, the authors combine developmental and comparative transcriptomics to identify candidate genes with plastic, canalized, or lineage-specific (i.e., divergent) expression patterns associated with an unusual overwintering phenomenon (Dehnel's phenomenon - seasonal size plasticity) in the Eurasian shrew. Their focus is on the shrinkage and regrowth of the hypothalamus, a brain region that undergoes significant seasonal size changes in shrews and plays a key role in regulating metabolic homeostasis. Through combined transcriptomic analysis, they identify genes showing derived (lineage-specific), plastic (seasonally regulated), and canalized (both lineage-specific and plastic) expression patterns. The authors hypothesize that genes involved in pathways such as the blood-brain barrier, metabolic state sensing, and ion-dependent signaling will be enriched among those with notable transcriptomic patterns. They complement their transcriptomic findings with a cell culture-based functional assessment of a candidate gene believed to reduce apoptosis.

      Strengths:

      The study's rationale and its integration of developmental and comparative transcriptomics are well-articulated and represent an advancement in the field. The transcriptome, known for its dynamic and plastic nature, is also influenced by evolutionary history. The authors effectively demonstrate how multiple signals-evolutionary, constitutive, and plastic-can be extracted, quantified, and interpreted. The chosen phenotype and study system are particularly compelling, as it not only exemplifies an extreme case of Dehnel's phenotype, but the metabolic requirements of the shrew suggest that genes regulating metabolic homeostasis are under strong selection.

      Weaknesses:

      (1) In a number of places (described in detail below), the motivation for the experimental, analytical, or visualization approach is unclear and may obscure or prevent discoveries.

      Thank you for finding our research and manuscript compelling, as well as the valuable feedback that will drastically improve our manuscript. We hope that we have alleviated your concerns below by following your instructions below.

      (2) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text:

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Figure 1B).

      - The authors do not indicate whether they perform cluster-specific GO or KEGG pathway enrichment analyses. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses.

      Thank you for this valuable feedback. We did not want to include clusters we deemed to be related to development, as this should not be attributed to changes associated with Dehnel’s phenomenon. We did this through qualitative, visual inspection, which we realize can differ between parties (i.e., clusters 2, 8, and 12 appeared to be seasonal). Qualitatively, we were looking for extreme divergence between Stage 1 and Stage 5 individuals, as expression was related to season and not development, then the average of these stages within cluster should be relatively similar. We have now quantified this as large differences in z-score (abs(summer juvenile-summer adult)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it both our methods (lines 699-702) and results (line 192).

      Regarding the combination of clusters for pathway enrichment compared to individual pathways, we agree that combining clusters may be more informative for overall homeostasis, compared to individual clusters which may inform us on processes directly related to Dehnel’s phenomenon. Initially, we were tentative to conduct this analysis, as clusters contain small gene sets, reducing the ability to detect pathway enrichments. We have now included this analysis, which is reported in our methods (lines 703-704), results (lines 203-204)., and new supplemental table.

      (3) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We agree that our rationale for validating BCL2L1 function in neural cell lines was not clearly explained in the manuscript. We selected BCL2L1 because it is the furthest downstream gene in the apoptotic pathway, thus making it the most directly involved gene in programmed cell death, whereas upstream genes could influence additional genes or alternative processes. We have clarified this choice in the revised methods section (lines 748-750).

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis, but it is unclear if this was also done for the S. araneus analysis. If not, why? If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis? Sex-specific expression elevates with group variation and could impact the discovery of differentially expressed genes.

      Regarding the use of sex as a covariate, we acknowledge the concerns raised. In our evolutionary analyses, we maintained a balanced sex ratio within species when possible. EVE models handle the effect of sex on gene expression as intraspecific variation. In shrews, however, we used males exclusively, as females were only found among juvenile individuals. Including those juvenile females would have introduced age effects, with perhaps a larger effect on our results. For the seasonal data, we have now included sex as a covariate in differential expression analyses. However, our design is imbalanced in relation to sex, which we have now discussed in our methods (lines 713-714) and discussion limitations (lines 544-548).

      (4) Discussion: The term "adaptive" is used frequently and liberally throughout the discussion. The interpretation of seasonal changes in gene expression as indicators of adaptive evolution should be done cautiously as such changes do not necessarily imply causal or adaptive associations.

      Thank you for this insight. We have reviewed our discussion and clarified that adaptations are putative (i.e. lines 146, 285, and 332), and highlighted this in our limitations section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I would recommend always spelling out "Dehnel's phenomenon" or even replacing this term (after crediting the DP term) with the more informative "seasonal size plasticity". Every time I saw "DP", I had to remind myself what this referred to. If the authors choose not to do so, please use the acronym consistently (e.g. line 186 has it spelled out).

      We have replaced the acronym DP with either the full term or the more informative “seasonal size plasticity” throughout the text.

      (2) Line 202: "DEG" has not been defined. Simply add to the line before.

      Thank you for this attention to detail. We have added this to the line above (210).

      (3) Please add a reference for the "AnAge" tool that was used to determine if samples were pubescent.

      Thank you for identifying this oversight. We have now cited the proper paper in line 634.

      (4) In the BCL2L1 section in the results, add a callout to Figure 2D.

      We have now added a callout to Figure 2D within the results (line 234).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 122: is associated? These adaptations?

      Thank you for identifying that we were missing the words “associated with” here. We have fixed this in the revision.

      (2) The first paragraph of the Results should be moved to the methods, except maybe the number of orthologs.

      Thank you for this insight. We have removed this portion from the results section.

      (3) Why a Bonferroni correction on line 188? That seems too strict.

      We agree the Bonferroni correction is strict. Results when using other less strict methods for controlling false discovery rate are also not significant after correction. These corrections can be found within the data, however, we only report on the Bonferroni correction.

      (4) Line 427: "is a novel candidate gene for several neurological disorders" needs some references. I see them a couple of sentences later, but that's quite a sentence with no references at the end.

      We have added the proper citations for this sentence (line 524).

      Reviewer #3 (Recommendations for the authors):

      (1) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text Line176-193:

      - The authors report the total number of genes meeting inclusion criteria (>0.5-fold change between any two stages and 2 samples >10 normalized reads), but it would be more informative to also provide the number of genes within each temporal cluster. This would offer a clearer understanding of how gene expression patterns are distributed over time.

      Unfortunately, this information is difficult to depict on our figure and would use too much space in the text. We have thus added a description of the range of genes in a new supplemental table depicting this information.

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Fig. 1B). Using a differential gene expression criterion might be more suitable. For example, do excluded genes show significant log-fold differences between late-stage comparisons?

      As previously mentioned, we have now quantified seasonal shifts as large differences in z-score (abs(summer juveniles-summer adults)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it to our methods (lines 699-702).  We then follow this up with differential expression analyses as described in Figure 2.

      - Did the authors perform cluster-specific GO or KEGG pathway enrichment analyses instead of focusing on the combined set of genes across the season shift clusters? While I understand that the small number of genes in each cluster may be limiting, if pathways emerge from cluster-specific analysis, they could provide more detailed insights into the functional significance of these temporal expression patterns. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses. Additionally, no corrections for multiple hypothesis testing were applied, as noted in the results. A more refined gene set (e.g., using differential expression criteria, described above) could be more appropriate for these analyses.

      We have now included cluster-specific KEGG enrichments as previously described.

      (2) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets - Figure 2 and lines195-227:

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We have now included the reasoning for further validation of BCL2L1 as described above.

      - The relevance of the "higher degree" differentially expressed genes needs more explanation. Although this group of genes is highlighted in the results, they are not featured in any subsequent analyses, leaving their importance unclear.

      Thank you for this insight. We have removed this from the methods as it is not relevant to subsequent analyses or conclusions.

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis (Line 525), but it is unclear if this was also done for the S. araneus analysis. If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis?

      We have now incorporated information on sex as described above.

      (3) Discussion:

      The term "adaptive" is used frequently and liberally throughout the discussion, but the authors should be cautious in interpreting seasonal changes in gene expression as indicators of adaptive evolution. Such changes do not necessarily imply causal or adaptive associations, and this distinction should be clearly stated when discussing the results.

      Thank you for this feedback and we agree with your conclusion, while a second expression optimum in the shrew lineage is indicative of adaptive expression, we cannot fully determine whether these are caused by genetic or environmental factors, despite careful attention to experimental design. We have highlighted this as a limitation in the discussion.

      (4) Minor Editorial Comment:

      Line 105: "... maintenance of an energy budgets..." delete "an"

      We have removed this grammatical error.

    1. Author response:

      Reviewer #1:

      Strengths:

      (1) Using a fairly generic ecological model, the method can identify the change in the relative importance of different ecological forces (distribution of interspecies interactions, demographic noise, and immigration) in different sample groups. The authors focus on the case of the human gut microbiota, showing that the data are consistent with a higher influence of species interactions (relative to demographic noise and immigration) in a disease microbiota state than in healthy ones. (2) The method is novel, original, and it improves the state-of-the-art methodology for the inference of ecologically relevant parameters. The analysis provides solid evidence for the conclusions. 

      Weaknesses:

      In the way it is written, this work might be mostly read by physicists. We believe that, with some rewriting, the authors could better highlight the ecological implications of the results and make the method more accessible to a broader audience.

      We thank the reviewer for their positive and constructive feedback. We particularly appreciate the recognition of the novelty and robustness of our method, as well as the insight that it sheds light on the shifting ecological forces between healthy and diseased microbiomes. In response to the concern about the manuscript’s accessibility, we aim to revise key sections – including the Introduction, Results, and Discussion – to more clearly articulate the ecological relevance of our theoretical findings. We would like to emphasize that our approach offers a novel perspective for analyzing individual species' abundances, as well as for understanding interaction patterns and stability at the community level. By placing our results within a broader context accessible to readers from diverse backgrounds, we aim for the revised version to appeal to a wider audience, including ecologists and microbiome scientists, while preserving the rigor of our underlying statistical physics framework.

      Reviewer #2:

      Strengths:

      A well-written article, relatively easy to follow and transparent despite the high degree of technicality of the underlying theory. The authors provide a powerful inferring procedure, which bypasses the issue of having only compositional data. 

      Weaknesses:

      (1) This sentence in the introduction seems key to me: "Focusing on single species properties as species abundance distribution (SAD), it fails to characterise altered states of microbiome." Yet it is not explained what is meant by 'fail', and thus what the proposed approach 'solves'. (2) Lack of validation, following arbitrary modelling choices made (symmetry of interactions, weak-interaction limit, uniform carrying capacity). Inconsistent interpretation of instability. Here, instability is associated with the transition to the marginal phase, which becomes chaotic when interaction symmetry is broken. But as the authors acknowledge, the weak interaction limit does not reproduce fat-tailed abundance distributions found in data. On the other hand, strong interaction regimes, where chaos prevails, tend to do so (Mallmin et al, PNAS 2024). Thus, the nature of the instability towards which unhealthy microbiomes approach is unclear. (3) Three technical points about the methodology and interpretation. a) How can order parameters ℎ and 𝑞0 can be inferred, if in the compositional data they are fixed by definition? b) How is it possible that weaker interaction variance is associated with an approach to instability, when the opposite is usually true? c) Having an idea of what the empirical data compares to the theoretical fits would be valuable. Implications: As the authors say, this is a proof of concept. They point at limits and ways to go forward, in particular pointing at ways in which species abundance distributions could be better reproduced by the predicted dynamical models. One implication that is missing, in my opinion, is the interpretability of the results, and what this work achieves that was missing from other approaches (see weaknesses section above): what do we learn from the fact that changes in microbial interactions characterise healthy from unhealthy microbiota? For instance, what does this mean for medical research?

      We greatly appreciate the reviewer’s thoughtful analysis highlighting both the strengths and areas of ambiguity in our work.

      (1) To clarify the sentence on the limitations of species abundance distributions (SADs), we aim to explain in the revised version that while SADs summarize the relative abundance of individual species, they fail to capture the species-species correlations that we have shown (Seppi et al., Biomolecules 2023) to be more susceptible to the healthy state of the host. Our method thus focused on the interaction statistics among species, providing insights into underlying dynamics and stability of the microbiomes and their differences between healthy and unhealthy hosts.

      (2) Regarding model assumptions, we acknowledge that the weak interaction regime and symmetry hypotheses simplify the analysis and may not capture all empirical richness, such as fat-tailed distributions of species abundance. However, we interpret instability not as a path to chaos per se, but as a transition toward a multi-attractor phase, where each microbiome reaches a different fixed point. This is consistent with prior empirical findings invoking the “Anna Karenina principle”, where healthy microbiomes resemble one another, but disease states tend to deviate from this picture (see Pasqualini et al., PLOS Comp. Bio. 2024). We consider our framework as a starting point and agree that further extensions incorporating strong interaction regimes (as suggested by Mallmin et al., PNAS 2024) or relaxing other model assumptions could reveal even richer dynamical patterns. The computational pipeline we present can be, in fact, easily generalizable to include different population dynamics models.

      On the technical questions: (a) While compositional data constrain relative abundances, we can still estimate diversity-dependent parameters (h and q0) using alpha-diversity statistics across samples, which show meaningful variation; (b) The counter-intuitive instability that the reviewer pointed out arises from the interplay between demographic stochasticity and quenched disorder. It is the combined contribution of these two factors in phase space – not either one alone – that drives the transition. For clarity, see Figure 1 in Altieri et al., Phys. Rev. Lett. 2021; (c) We plan to include plots that compare empirical data to theoretical model fits. This will help visualize how well the model captures observed microbial community properties demographic noise (𝑇), healthy communities are more stable (i.e., distantσ from the and how even with larger species interaction heterogeneity (σ) and larger critical line), as measured, by the replicon eigenvalue. Finally, regarding interpretability and implications: by showing that ecological interaction networks – not just species identities – differ between healthy and unhealthy states, our work suggests a conceptual shift. This could inform medical strategies aimed at restoring community-level stability rather than targeting individual microbes. In the revised Discussion section, we will elaborate on this point to better highlight its practical implications and outline potential directions for future research.

      Reviewer #3:

      Strengths:

      The modeling efforts of this study primarily rely on a disordered form of the generalized Lotka-Volterra (gLV) model. This model can be appropriate for investigating certain systems, and the authors are clear about when and how more mechanistic models (i.e., consumer-resource) can lead to gLV. Phenomenological models such as this have been found to be highly useful for investigating the ecology of microbiomes, so this modeling choice seems justified, and the limitations are laid out. 

      Weaknesses:

      The authors use metagenomic data of diseased and healthy patients that were first processed in Pasqualini et al. (2024). The use of metagenomic data leads me to a question regarding the role of sampling effort (i.e., read counts) in shaping model parameters such as h. This parameter is equal to the average of 1/# species across samples because the data are compositional in nature. My understanding is that it was calculated using total abundances (i.e., read counts). The number of observed species is strongly influenced by sampling effort, so it would be useful if the number of reads were plotted against the number of species for healthy and diseased subjects. However, the role of sampling effort can depend on the type of data, and my instinct about the role that sampling effort plays in species detection is primarily based on 16S data. The dependency between these two variables may be less severe for the authors' metagenomic pipeline. This potential discrepancy raises a broader issue regarding the investigation of microbial macroecological patterns and the inference of ecological parameters. Often microbial macroecology researchers rely on 16S rRNA amplicon data because that type of data is abundant and comparatively low-cost. Some in microbiology and bioinformatics are increasingly pushing researchers to choose metagenomics over 16S. Sometimes this choice is valid (discovery of new MAGs, investigate allele frequency changes within species, etc.), sometimes it is driven by the false equivalence "more data = better". The outcome, though, is that we have a body of more-or-less established microbial macroecological patterns which rest on 16S data and are now slowly incorporating results from metagenomics. To my knowledge, there has not been a systematic evaluation of the macroecological patterns that do and do not vary by one's choice in 16S vs. metagenomics. Several of the authors in this manuscript have previously compared the MAD shape for 16S and metagenomic datasets in Pasqualini et al., but moving forward, a more comprehensive study seems necessary.

      We thank the reviewer for this insightful and nuanced comment, which particularly highlights the broader methodological context of our data sources. Indeed, metagenomic sequencing introduces different biases with respect to 16S data. First, we would like to emphasize that we estimated the order parameters from the data by using relative abundances. Second, while the concern regarding the influence of sequencing depth and species diversity on the estimation of the order parameters is valid, we refer to a previous publication by some of the authors (Pasqualini et al., 2024; see Figure 4, panels g and h). There, we pointed out that the observed outcome is weakly influenced by sequencing depth in our dataset, while the main impact on the order parameters estimate comes from the species diversity of the two groups. In the same publication, we showed that other well-known patterns (species abundance distribution, mean abundance distribution) are also observed. Also, to mitigate the effect of the number of samples and sequencing depth, we estimated the order parameters by a bootstrap procedure (90% of samples for healthy and diseased groups, 5000 resamples), which resulted in the error bars in Figure 2.

      We also fully agree with the broader call for a systematic comparison of macroecological patterns derived from 16S and metagenomic data. While some of us have already begun exploring this direction (e.g., Pasqualini et al., 2024), the reviewer’s comment highlights its significance and motivates us to pursue a more comprehensive, integrative analysis across data types. While we found qualitative agreement of these patterns with previous publications (e.g., Grilli, Nature Comm. 2020), we will acknowledge this as an important future direction in the Discussion section.

      References

      (1) Seppi, M., Pasqualini, J., Facchin, S., Savarino, E.V. and Suweis, S., 2023. Emergent functional organization of gut microbiomes in health and diseases. Biomolecules, 14(1), p.5.

      (2) Pasqualini, J., Facchin, S., Rinaldo, A., Maritan, A., Savarino, E. and Suweis, S., 2024. Emergent ecological patterns and modelling of gut microbiomes in health and in disease. PLOS Computational Biology, 20(9), p.e1012482.

      (3) Mallmin, E., Traulsen, A. and De Monte, S., 2024. Chaotic turnover of rare and abundant species in a strongly interacting model community. Proceedings of the National Academy of Sciences, 121(11), p.e2312822121.

      (4) Altieri, A., Roy, F., Cammarota, C., & Biroli, G. (2021). Properties of equilibria and glassy phases of the random Lotka-Volterra model with demographic noise. Physical Review Letters, 126(25), 258301.

      (5) Grilli, J. (2020). Macroecological laws describe variation and diversity in microbial communities. Nature communications, 11(1), 4743.

    1. Author response:

      Reviewer 1:

      (1) Clarification of axon mistargeting patterns and model interpretation

      We will clarify the apparent discrepancy between chick and mouse axon mistargeting data. Specifically, we will expand the explanation in the main text and Figure 7 legend and/or revise the model in Figure 7 to better reflect observed phenotypes and clarify how Sp1 overexpression contributes to mistargeting.

      (2) Evidence for Sp1-dependent ephrin expression

      We agree that demonstrating ephrin expression changes in motor neurons is essential. We will: • Conduct in situ hybridization and/or immunostaining for ephrins in control and Sp1 mutant spinal cords from both chick and mouse embryos.

      Clarify and expand the methodological details of the NSC-34 cell experiments shown in Figure 4G.

      (3) RNA-seq experiment details

      We will revise the Methods section to provide additional experimental details.

      (4) Use of Syn1-cre

      We acknowledge concerns about the broad expression of Syn1-cre. To address this:

      We will clarify our rationale for using Syn1-cre and describe its expression pattern in the spinal cord.

      We are evaluating the feasibility of additional experiments using a motor neuron-specific Cre driver to confirm cell-type specificity.

      We will include a new paragraph in the Discussion addressing potential contributions from other neuronal populations.

      Reviewer 2:

      (1) & (2) Clarification and localization of RNA-seq data

      We will expand the Methods section to provide greater detail on the RNA-seq approach. In addition, we will validate ephrin downregulation in LMC neurons using in situ hybridization and/or immunostaining.

      (3) Integration of ChIP and RNA-seq data We will:

      Report additional ChIP peaks for ephrinA5 and other differentially expressed genes such as Sema7a.

      Add a summary figure that integrates ChIP and RNA-seq results to strengthen the link between Sp1 binding and transcriptional regulation.

      (4) Clarification of the cis-attenuation model

      We recognize that our data do not yet directly demonstrate Sp1’s role in cis-attenuation. To address this:

      We will revise the abstract and main text to frame Sp1's role in cis-attenuation as a hypothesis. • We are exploring the feasibility of ephrinA5 and B2 rescue experiments in Sp1-deficient embryos to test specificity.

      (5) Behavioral phenotypes and cell-type specificity

      We will clarify that behavioral phenotypes may result from combined effects across neuron populations due to Syn1-cre expression. To address this:

      We are planning rescue experiments with Sp1 expression in chick embryos to test for rescue of axon misrouting.

      We will include a new paragraph in the Discussion to highlight this limitation and discuss alternative interpretations.

      Reviewer 3:

      We appreciate your positive evaluation and support for the rigor of our study.

      In response to your suggestions:

      We are revising the manuscript to improve clarity and flow, particularly the transitions between datasets.

      We will update Figure 7 and the associated text to more clearly convey the working model and avoid overinterpretation.

      We thank all reviewers for their constructive feedback and are committed to addressing each point thoroughly. All revisions will be clearly marked in the resubmitted manuscript.

    1. Author response:

      (This author response relates to the first round of peer review by Biophysics Colab. Reviews and responses to both rounds of review are available here: https://sciety.org/articles/activity/10.1101/2023.10.23.563601.)

      General Assessment:

      Pannexin (Panx) hemichannels are a family of heptameric membrane proteins that form pores in the plasma membrane through which ions and relatively large organic molecules can permeate. ATP release through Panx channels during the process of apoptosis is one established biological role of these proteins in the immune system, but they are widely expressed in many cells throughout the body, including the nervous system, and likely play many interesting and important roles that are yet to be defined. Although several structures have now been solved of different Panx subtypes from different species, their biophysical mechanisms remain poorly understood, including what physiological signals control their activation. Electrophysiological measurements of ionic currents flowing in response to Panx channel activation have shown that some subtypes can be activated by strong membrane depolarization or caspase cleavage of the C-terminus. Here, Henze and colleagues set out to identify endogenous activators of Panx channels, focusing on the Panx1 and Panx2 subtypes, by fractionating mouse liver extracts and screening for activation of Panx channels expressed in mammalian cells using whole-cell patch clamp recordings. The authors present a comprehensive examination with robust methodologies and supporting data that demonstrate that lysophospholipids (LPCs) directly Panx-1 and 2 channels. These methodologies include channel mutagenesis, electrophysiology, ATP release and fluorescence assays, molecular modelling, and cryogenic electron microscopy (cryo-EM). Mouse liver extracts were initially used to identify LPC activators, but the authors go on to individually evaluate many different types of LPCs to determine those that are more specific for Panx channel activation. Importantly, the enzymes that endogenously regulate the production of these LPCs were also assessed along with other by-products that were shown not to promote pannexin channel activation. In addition, the authors used synovial fluid from canine patients, which is enriched in LPCs, to highlight the importance of the findings in pathology. Overall, we think this is likely to be a landmark study because it provides strong evidence that LPCs can function as activators of Panx1 and Panx2 channels, linking two established mediators of inflammatory responses and opening an entirely new area for exploring the biological roles of Panx channels. Although the mechanism of LPC activation of Panx channels remains unresolved, this study provides an excellent foundation for future studies and importantly provides clinical relevance.

      We thank the reviewers for their time and effort in reviewing our manuscript. Based on their valuable comments and suggestions, we have made substantial revisions. The updated manuscript now includes two new experiments supporting that lysophospholipid-triggered channel activation promotes the release of signaling molecules critical for immune response and demonstrates that this novel class of agonist activates the inflammasome in human macrophages through endogenously expressed Panx1. To better highlight the significance of our findings, we have excluded the cryo-EM panel from this manuscript. We believe these changes address the main concerns raised by the reviewers and enhance the overall clarity and impact of our findings. Below, we provide a point-by-point response to each of the reviewers’ comments.

      Recommendations:

      (1) The authors present a tremendous amount of data using different approaches, cells and assays along with a written presentation that is quite abbreviated, which may make comprehension challenging for some readers. We would encourage the authors to expand the written presentation to more fully describe the experiments that were done and how the data were analysed so that the 2 key conclusions can be more fully appreciated by readers. A lot of data is also presented in supplemental figures that could be brought into the main figures and more thoroughly presented and discussed.

      We appreciate and agree with the reviewers’ observation. Our initial manuscript may have been challenging to follow due to our use of both wild-type and GS-tagged versions of Panx1 from human and frog origins, combined with different fluorescence techniques across cell types. In this revision, we used only human wild-type Panx1 expressed in HEK293S GnTI- cells, except for activity-guided fractionation experiments, where we used GS-tagged Panx1 expressed in HEK293 cells (Fig. 1). For functional reconstitution studies, we employed YO-PRO-1 uptake assays, as optimizing the Venus-based assay was challenging. We have clarified these exceptions in the main text. We think these adjustments simplify the narrative and ensure an appropriate balance between main and supplemental figures.

      (2) It would also be useful to present data on the ion selectivity of Panx channels activated by LPC. How does this compare to data obtained when the channel is activated by depolarization? If the two stimuli activate related open states then the ion selectivity may be quite similar, but perhaps not if the two stimuli activate different open states. The authors earlier work in eLife shows interesting shifts in reversal potentials (Vrev) when substituting external chloride with gluconate but not when substituting external sodium with N-methyl-D-glucamine, and these changed with mutations within the external pore of Panx channels. Related measurements comparing channels activated by LPC with membrane depolarization would be valuable for assessing whether similar or distinct open states are activated by LPC and voltage. It would be ideal to make Vrev measurements using a fixed step depolarization to open the channel and then various steps to more negative voltages to measure tail currents in pinpointing Vrev (a so called instantaneous IV).

      We fully agree with the reviewer on the importance of ion selectivity experiments. However, comparing the properties of LPC-activated channels with those activated by membrane depolarization presented technical challenges, as LPC appears to stimulate Panx1 in synergy with voltage. Prolonged LPC exposure destabilizes patches, complicating G-V curve acquisition and kinetic analyses. While such experiments could provide mechanistic insights, we think they are beyond the scope of current study.

      (3) Data is presented for expression of Panx channels in different cell types (HEK vs HEKS GnTI-) and different constructs (Panx1 vs Panx1-GS vs other engineered constructs). The authors have tried to be clear about what was done in each experiment, but it can be challenging for the reader to keep everything straight. The labelling in Fig 1E helps a lot, and we encourage the authors to use that approach systematically throughout. It would also help to clearly identify the cell type and channel construct whenever showing traces, like those in Fig 1D. Doing this systematically throughout all the figures would also make it clear where a control is missing. For example, if labelling for the type of cell was included in Fig 1D it would be immediately clear that a GnTI- vector alone control for WT Panx1 is missing as the vector control shown is for HEK cells and formally that is only a control for Panx2 and 3. Can the authors explain why PLC activates Panx1 overexpressed in HEK293 GnTl- cells but not in HEK293 cells? Is this purely a function of expression levels? If so, it would be good to provide that supporting information.

      As mentioned above, we believe our revised version is more straightforward to digest. We have improved labeling and provided explanations where necessary to clarify the manuscript. While Panx1 expression levels are indeed higher in GnTI- than in HEK293 cells, we are uncertain whether the absence of detectable currents in HEK293 cells is solely due to expression levels. Some post-translational modifications that inhibit Panx1, such as lysine acetylation, may also impact activity. Future studies are needed to explore these mechanisms further.

      (4) The mVenus quenching experiments are somewhat confusing in the way data are presented. In Fig 2B the y axis is labelled fluorescence (%) but when the channel is closed at time = 0 the value of fluorescence is 0 rather than 100 %, and as the channel opens when LPC is added the values grow towards 100 instead of towards 0 as iodide permeates and quenches. It would be helpful if these types of data could be presented more intuitively. Also, how was the initial rate calculated that is plotted in Fig 2C? It would be helpful to show how this is done in a figure panel somewhere. Why was the initial rate expressed as a percent maximum, what is the maximum and why are the values so low? Why is the effect of CBX so weak in these quenching experiments with Panx1 compared to other assays? This assay is used in a lot of experiments so anything that could be done to bolster confidence is what it reports on would be valuable to readers. Bringing in as many control experiments that have been done, including any that are already published, would be helpful.

      We modified the Y-axis in Figure 2 to “Quench (%)” for clarity. The data reflects fluorescence reduction over time, starting from LPC addition, normalized to the maximal decrease observed after Triton-X100 addition (3 minutes), enabling consistent quenching value comparisons. Although the quenching value appears small, normalization against complete cell solubilization provides reproducible comparisons. We do not fully understand why CBX effects vary in Venus quenching experiments, but we speculate that its steroid-like pentacyclic structure may influence the lysophospholipid agonistic effects. As noted in prior studies (DOI: 10.1085/jgp.201511505; DOI: 10.7554/eLife.54670), CBX likely acts as an allosteric modulator rather than a simple pore blocker, potentially contributing to these variations.

      (5) Could provide more information to help rationalize how Yo-Pro-1, which has a charge of +2, can permeate what are thought to be anion favouring Panx channels? We appreciate that the biophysical properties of Panx channel remain mysterious, but it would help to hear how a bit more about the authors thinking. It might also help to cite other papers that have measured Yo-Pro-1 uptake through Panx channels. Was the Strep-tagged construct of Panx1 expressed in GnTI- cells and shown to be functional using electrophysiology?

      Our recent study suggest that the electrostatic landscape along the permeation pathway may influence its ion selectivity (DOI: 10.1101/2024.06.13.598903). However, we have not yet fully elucidated how Panx1 permeates both anions and cations. Based on our findings, ion selectivity may vary with activation stimulus intensity and duration. Cation permeation through Panx1 is often demonstrated with YO-PRO-1, which measures uptake over minutes, unlike electrophysiological measurements conducted over milliseconds to seconds. We referenced two representative studies employing YO-PRO-1 to assess Panx1 activity. Whole-cell current measurements from a similar construct with an intracellular loop insertion indicate that our STREP-tagged construct likely retains functional capacity.

      (6) In Fig 5 panel C, data is presented as the ratio of LPC induced current at -60 mV to that measured at +110 mV in the absence of LPC. What is the rationale for analysing the data this way? It would be helpful to also plot the two values separately for all of the constructs presented so the reader can see whether any of the mutants disproportionately alter LPC induced current relative to depolarization activated current. Also, for all currents shown in the figures, the authors should include a dashed coloured line at zero current, both for the LPC activated currents and the voltage steps.

      We used the ratio of LPC-induced current to the current measured at +110 mV to determine whether any of the mutants disproportionately affect LPC-induced current relative to depolarization-activated current. Since the mutants that did not respond to LPC also exhibited smaller voltage-stimulated currents than those that did respond, we reasoned that using this ratio would better capture the information the reviewer is suggesting to gauge. Showing the zero current level may be helpful if the goal was to compare basal currents, which in our experience vary significantly from patch to patch. However, since we are comparing LPC- and voltage-induced currents within the same patch, we believe that including basal current measurements would not add useful information to our study.

      Given that new experiments included to further highlight the significance of the discovery of Panx1 agonists, we opted to separate structure-based mechanistic studies from this manuscript and removed this experiment along with the docking and cryo-EM studies.

      (7) The fragmented NTD density shown in Fig S8 panel A may resemble either lipid density or the average density of both NTD and lipid. For example, Class7 and Class8 in Fig.S8 panel D displayed split densities, which may resemble a phosphate head group and two tails of lipid. A protomer mask may not be the ideal approach to separate different classes of NTD because as shown in Fig S8 panel D, most high-resolution features are located on TM1-4, suggesting that the classification was focused on TM1-4. A more suitable approach would involve using a smaller mask including NTD, TM1, and the neighbouring TM2 region to separate different NTD classes.

      We agree with the reviewer and attempted 3D classification using multiple smaller masks including the suggested region. However, the maps remained poorly defined, and we were unable to confidently assign the NTD.

      (8) The authors don’t discuss whether the LPC-bound structures display changes in the external part of the pore, which is the anion-selective filter and the narrower part of the pore. If there are no conformational changes there, then the present structures cannot explain permeability to large molecules like ATP. In this context, a plot for the pore dimension will be helpful to see differences along the pore between their different structures. It would also be clearer if the authors overlaid maps of protomers to illustrate differences at the NTD and the "selectivity filter."

      Both maps show that the narrowest constriction, formed by W74, has a diameter of approximately 9 Å. Previous steered molecular dynamics simulations suggest that ATP can permeate through such a constriction, implying an ion selection mechanism distinct from a simple steric barrier.

      (9) The time between the addition of LPC to the nanodisc-reconstituted protein and grid preparation is not mentioned. Dynamic diffusion of LPC could result in equal probabilities for the bound and unbound forms. This raises the possibility of finding the Primed state in the LPC-bound state as well. Additionally, can the authors rationalize how LPC might reach the pore region when the channel is in the closed state before the application of LPC?

      We appreciate the reviewer’s insight. We incubated LPC and nanodisc-reconstituted protein for 30 minutes, speculating that LPC approaches the pore similarly to other lipids in prior structures. In separate studies, we are optimizing conditions to capture more defined conformations.

      (10) In the cryo-EM map of the “resting” state (EMDB-21150), a part of the density was interpreted as NTD flipped to the intracellular side. This density, however, is poorly defined, and not connected to the S1 helix, raising concerns about whether this density corresponds to the NTD as seen in the “resting” state structure (PDB-ID: 6VD7). In addition, some residues in the C-terminus (after K333 in frog PANX1) are missing from the atomic model. Some of these residues are predicted by AlphaFold2 to form a short alpha helix and are shown to form a short alpha helix in some published PANX1 structures. Interestingly, in both the AF2 model and 6WBF, this short alpha helix is located approximately in the weak density that the authors suggest represents the “flipped” NTD. We encourage the authors to be cautious in interpreting this part as the “flipped” NTD without further validation or justification.

      We agree that the density corresponding the extended NTD into the cytoplasm is relatively weak. In our recent study, we compared two Panx1 structures with or without the mentioned C-terminal helix and found evidence suggesting the likelihood of NTD extension (DOI: 10.1101/2024.06.13.598903). Nevertheless, to prevent potential confusion, we have removed the cryo-EM panel from this manuscript.

      (11) Since the authors did not observe densities of bound PLC in the cryo-EM map, it is important to acknowledge in the text the inherent limitations of using docking and mutagenesis methods to locate where PLC binds.

      Thank you for the suggestion. We have removed this section to avoid potential confusion.

      Optional suggestions:

      (1) The authors used MeOH to extract mouse liver for reversed-phase chromatography. Was the study designed to focus on hydrophobic compounds that likely bind to the TMD? Panx1 has both ECD and ICD with substantial sizes that could interact with water soluble compounds? Also, the use of whole-cell recordings to screen fractions would not likely identify polar compounds that interact with the cytoplasmic part of the TMD? It would be useful for the authors to comment on these aspects of their screen and provide their rationale for fractionating liver rather than other tissues.

      We have added a rationale in line 90, stating: “The soluble fractions were excluded from this study, as the most polar fraction induced strong channel activities in the absence of exogenously expressed pannexins.” Additionally, we have included a figure to support this rationale (Fig. S1A).

      (2) The authors show that LPCs reversibly increase inward currents at a holding voltage of -60 mV (not always specified in legends) in cells expressing Panx1 and 2, and then show families of currents activated by depolarizing voltage steps in the absence of LPC without asking what happens when you depolarize the membrane after LPC activation? If LPCs can be applied for long enough without disrupting recordings, it would be valuable to obtain both I-V relations and G-V relations before and after LPC activation of Panx channels. Does LPC disproportionately increase current at some voltages compared to others? Is the outward rectification reduced by LPC? Does Vrev remain unchanged (see point above)? Its hard to predict what would be observed, but almost any outcome from these experiments would suggest additional experiments to explore the extent to which the open states activated by LPC and depolarization are similar or distinct.

      Unfortunately, in our hands, the prolonged application of lysolipids at concentrations necessary to achieve significant currents tends to destabilize the patch. This makes it challenging to obtain G-V curves or perform the previously mentioned kinetic analyses. We believe this destabilization may be due to lysolipids’ surfactant-like qualities, which can disrupt the giga seal. Additionally, prolonged exposure seems to cause channel desensitization, which could be another confounding factor.

      (3) From the results presented, the authors cannot rule out that mutagenesis-induced insensitivity of Panx channels to LPCs results from allosteric perturbations in the channels rather than direct binding/gating by LPCs. In Fig 5 panel A-C, the authors introduced double mutants on TM1 and TM2 to interfere with LPC binding, however, the double mutants may also disrupt the interaction network formed within NTD, TM1, and TM2. This disruption could potentially rearrange the conformation of NTD, favouring the resting closed state. Three double Asn mutants, which abolished LPC induced current, also exhibited lower currents through voltage activation in Fig 5S, raising the possibility the mutant channels fail to activate in response to LPC due to an increased energy barrier. One way to gain further insight would be to mutate residues in NTD that interact with those substituted by the three double Asn mutants and to measuring currents from both voltage activation and LPC activation. Such results might help to elucidate whether the three double Asn mutants interfere with LPC binding. It would also be important to show that the voltage-activated currents in Fig. S5 are sensitive to CBX?

      Thank you for the comment, with which we agree. Our initial intention was to use the mutagenesis studies to experimentally support the docking study. Due to uncertainties associated with the presented cryo-EM maps, we have decided to remove this study from the current manuscript. We will consider the proposed experiments in a future study.

      (4) Could the authors elaborate on how LPC opens Panx1 by altering the conformation of the NTDs in an uncoordinated manner, going from “primed” state to the “active” state. In the “primed” state, the NTDs seem to be ordered by forming interactions with the TMD, thus resulting in the largest (possible?) pore size around the NTDs. In contrast, in the “active” state, the authors suggest that the NTDs are fragmented as a result of uncoordinated rearrangement, which conceivably will lead to a reduction in pore size around NTDs (isn’t it?). It is therefore not intuitive to understand why a conformation with a smaller pore size represents an “active” state.

      We believe the uncoordinated arrangement of NTDs is dynamic, allowing for potential variations in pore size during the activated conformation. Alternatively, NTD movement may be coupled with conformational changes in TM1 and the extracellular domain, which in turn could alter the electrostatic properties of the permeation pathway. We believe a functional study exploring this mechanism would be more appropriately presented as a separate study.

      (5) Can the authors provide a positive control for these negative results presented in Fig S1B and C?

      The positive results are presented in Fig. 1D and E.

      (6) Raw images in Fig S6 and Fig S7 should contain units of measurement.

      Thank you for pointing this out.

      (7) It may be beneficial to show the superposition between primed state and activated state in both protomer and overall structure. In addition, superposition between primed state and PDB 7F8J.

      We attempted to superimpose the cryo-EM maps; however, visually highlighting the differences in figure format proved challenging. Higher-resolution maps would allow for model building, which would more effectively convey these distinctions.

      (8) Including particles number in each class in Fig S8 panel C and D would help in evaluating the quality of classification.

      Noted.

      (9) A table for cryo-EM statistics should be included.

      Thanks, noted.

      (10) n values are often provided as a range within legends but it would be better to provide individual values for each dataset. In many figures you can see most of the data points, which is great, but it would be easy to add n values to the plots themselves, perhaps in parentheses above the data points.

      While we agree that transparency is essential, adding n-values to each graph would make some figures less clear and potentially harder to interpret in this case. We believe that the dot plots, n-value range, and statistical analysis provide adequate support for our claims.

      (11) The way caspase activation of Panx channels is presented in the introduction could be viewed as dismissive or inflammatory for those who have studied that mechanism. We think the caspase activation literature is quite convincing and there is no need to be dismissive when pointing out that there are good reasons to believe that other mechanisms of activation likely exist. We encourage you to revise the introduction accordingly.

      Thank you for this comment. Although we intended to support the caspase activation mechanism in our introduction, we understand that the reviewer’s interpretation indicates a need for clarification. We hope the revised introduction removes any perception of dismissiveness.

      (12) Why is the patient data in Fig 4F normalized differently than everything else? Once the above issues with mVenus quenching data are clarified, it would be good to be systematic and use the same approach here.

      For Fig. 4F, we used a distinct normalization method to account for substantial day-to-day variation in experiments involving body fluids. Notably, we did not apply this normalization to other experimental panels due to their considerably lower day-to-day variation.

      (13) What was the rational for using the structure from ref 35 in the docking task?

      The docking task utilized the human orthologue with a flipped-up NTD. We believe that this flipped-up conformation is likely the active form that responds to lysolipids. As our functional experiments primarily use the human orthologue for biological relevance, this structure choice is consistent. Our docking data shows that LPC does not dock at this site when using a construct with the downward-flipped NTD.

      (14) Perhaps better to refer to double Asn ‘substitutions’ rather than as ‘mutations’ because that makes one think they are Asn in the wt protein.

      Done.

      (15) From Fig S1, we gather that Panx2 is much larger than Panx1 and 3. If that is the case, its worth noting that to readers somewhere.

      We have added the molecular weight of each subtype in the figure legend.

      (16) Please provide holding voltages and zero current levels in all figures presenting currents.

      We provided holding voltages. However, the zero current levels vary among the examples presented, making direct comparisons difficult. Since we are comparing currents with and without LPC, we believe that indicating zero current levels is unnecessary for this study.

      (17) While the authors successfully establish lysophospholipid-gating of Panx1 and Panx2, Panx3 appears unaffected. It may be advisable to be more specific in the title of the article.

      We are uncertain whether Panx3 is unaffected by lysophospholipids, as we have not observed activation of this subtype under any tested conditions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This a comprehensive study that sheds light on how Wag31 functions and localises in mycobacterial cells. A clear link to interactions with CL is shown using a combination of microscopy in combination with fusion fluorescent constructs, and lipid specific dyes. Furthermore, studies using mutant versions of Wag31 shed light on the functionalities of each domain in the protein. My concerns/suggestions for the manuscript are minor:

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect on levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comment. We have added a better clarification on this in the discussion of revised manuscript. The lipid classes that get impacted by the depletion of Wag31 vs overexpression are different. Wag31 is an adaptor protein that interacts with proteins of the ACCase complex (Meniche et al., 2014; Xu et al., 2014) that synthesize fatty acid precursors and regulate their activity (Habibi Arejan et al., 2022).

      The varied response on lipid homeostasis could be attributed to a change in the stoichiometry of these interactions of Wag31. While Wag31 depletion would prevent such interactions from occurring and might affect lipid synthesis that directly depends on Wag31-protein partner interactions, its overexpression would lead to promiscuous interactions and a change in the stoichiometry of native interactions that would ultimately modulate lipid synthesis pathways.

      (2) The pulldown assays results are interesting, but links are tentative.

      We thank the reviewer for the comment. The interactome of Wag31 was identified through the immunoprecipitation of FLAG-Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates but not in the control were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off 18 and unique peptides5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      As mentioned in line 139 of the previous version of the manuscript, we agree that the interactions can either be direct or through a third partner. The fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, for validation, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. As mentioned above, this caveat was stated in the previous version of the manuscript.

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      We thank the reviewer for the comment. In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes. Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Public review):

      Summary:

      Kapoor et. al. investigated the role of the mycobacterial protein Wag31 in lipid and peptidoglycan synthesis and sought to delineate the role of the N- and C- terminal domains of Wag31. They demonstrated that modulating Wag31 levels influences lipid homeostasis in M. smegmatis and cardiolipin (CL) localisation in cells. Wag31 was found to preferentially bind CL-containing liposomes, and deleting the N-terminus of the protein significantly decreased this interaction. Novel interactions between Wag31 and proteins involved in lipid metabolism and cell wall synthesis were identified, suggesting that Wag31 recruits proteins to the intracellular membrane domain by direct interaction.

      Strengths:

      (1) The importance of Wag31 in maintaining lipid homeostasis is supported by several lines of evidence. (2) The interaction between Wag31 and cardiolipin, and the role of the N-terminus in this interaction was convincingly demonstrated.

      Weaknesses:

      (1) MS experiments provide some evidence for novel protein-protein interactions. However, the pulldown experiments lack a valid negative control.

      We thank the reviewer for the comment. We have included two non-interactors of Wag31 i.e. MmpL4 and MmpS5 which were not identified in our interactome database as negative controls in the experiment. As shown in Figure S3, we performed His pull-down experiments with both of them independently twice, each time with a positive control (known interactor of Wag31 (Msm2092)). Fig. S3b revised shows E. coli lysate expressing His-Wag31 which was incubated with Msm lysates expressing either FLAG tagged-MmpL4 or -MmpS5 or Msm2092 (revised Fig. S3c). The mixed lysates were pulled down with Cobalt beads that bind to the His-tagged protein and analysed using Western blot analysis by probing with anti-FLAG antibody (revised Fig. S3d.). The data presented confirms that the interactions validated through the pull down assay were indeed specific.

      (2) The role of the N-terminus in the protein-protein interaction has not been ruled out.

      We thank the reviewer for the comment. Wag31<sub>Msm</sub> is a 272 amino acids long protein. The Nterminal of Wag31, which houses the DivIVA-domain, comprises the first 60 amino acids. Previously, we attempted to express the N-terminal (60 aa long) and the C-terminal (212 aa long) truncated proteins in various mycobacterial shuttle vectors to perform MS/MS experiments. Despite numerous efforts, neither expressed with the N/C-terminal FLAG tag or no tag in episomal or integrative vectors due to instability of the protein. Eventually, we successfully expressed the C-terminal Wag31 with an N and Cterminal hexa-His tag. However, this expression was not sufficient or stable enough for us to perform Ni<sup>2+</sup>-affinity pull-down experiments for mass spectrometry. N-terminal of Wag31 could not be expressed in M. smegmatis even with N and C-terminal Hexa-His tags.

      To rule out the role of the N-terminal in mediating protein-protein interactions, we cloned the N-terminal of Wag31 that comprises the DivIVA-domain in pET28b vector (Fig. 7a revised). Subsequently, the truncated protein, hereafter called  Wag31<sub>∆C</sub>  flanked by 6X His tags at both the termini was expressed in E. coli and mixed with Msm lysates expressing interactors of Wag31 (Fig. 7b-c revised). Earlier experiments with Wag31<sub>∆1-60</sub or Wag31<sub>∆N</sub> (in the revised manuscript) were performed with MurG, SepIVA, Msm2092 and AccA3 (Fig. 7e-g). Thus, we used the same set of interactors to test our hypothesis. Briefly, His-  Wag31<sub>∆C</sub>  was mixed with Msm lysates expressing either FLAG-MurG, -SepIVA, -Msm2092 or -AccA3 and pull down experiments were performed as described previously. FLAGMmpS5, a non-interactor of Wag31 was used as a negative control. As shown in Fig. 7d revised, His-Wag31 could bind to all the four interactors whereas His- Wag31<sub>∆C</sub>  couldn’t, strengthening the conclusion that interactions of Wag31 with other proteins are mediated by its Cterminal. However, we can’t ignore the possibility of other interactors binding to the N-terminal of Wag31. Unfortunately, due to poor expression/instability of  Wag31<sub>∆C</sub>  in mycobacterial shuttle vectors, we are unable to perform a global interactome analysis of  Wag31<sub>∆C</sub>

      Reviewer #3 (Public review):

      Summary:

      This manuscript describes the characterization of mycobacterial cytoskeleton protein Wag31, examining its role in orchestrating protein-lipid and protein-protein interactions essential for mycobacterial survival. The most significant finding is that Wag31, which directs polar elongation and maintains the intracellular membrane domain, was revealed to have membrane tethering capabilities.

      Strengths:

      The authors provided a detailed analysis of Wag31 domain architecture, revealing distinct functional roles: the N-terminal domain facilitates lipid binding and membrane tethering, while the C-terminal domain mediates protein-protein interactions. Overall, this study offers a robust and new understanding of Wag31 function.

      Weaknesses:

      The following major concerns should be addressed.

      • Authors use 10-N-Nonyl-acridine orange (NAO) as a marker for cardiolipin localization. However, given that NAO is known to bind to various anionic phospholipids, how do the authors know that what they are seeing is specifically visualizing cardiolipin and not a different anionic phospholipid? For example, phosphatidylinositol is another abundant anionic phospholipid in mycobacterial plasma membrane.

      We thank the reviewer for the comment. Despite its promiscuous binding to other anionic phospholipids, 10-N-Nonyl-acridine orange is widely used to stain Cardiolipin and determine its localisation in bacterial cells and mitochondria of eukaryotes (Garcia Fernandez et al., 2004; Mileykovskaya & Dowhan, 2000; Renner & Weibel, 2011). This is because it has a stronger affinity for Cardiolipin than other anionic phospholipids with the affinity constant being 2 × 10<sup>6</sup> M−<sup>1</sup> for Cardiolipin association and 7 × 10<sup>4</sup> M−<sup>1</sup> for that of phosphatidylserine and phosphatidylinositol association (Petit et al., 1992). Additionally, there is not yet another stain available for detecting Cardiolipin. Our proteinlipid binding assays suggest that Wag31 preferentially binds to Cardiolipin over other anionic phospholipids (Fig. 4b), hence it is likely that the majority of redistribution of NAO fluorescence that we observe might be contributed by Cardiolipin mislocalization due to altered Wag31 levels, with smaller degree of NAO redistribution intensity coming indirectly from other anionic phospholipids displaced from the membrane due to the loss of membrane integrity and cell shape changes due to Wag31.

      • Authors' data show that the N-terminal region of Wag31 is important for membrane tethering. The authors' data also show that the N-terminal region is important for sustaining mycobacterial morphology. However, the authors' statement in Line 256 "These results highlight the importance of tethering for sustaining mycobacterial morphology and survival" requires additional proof. It remains possible that the N-terminal region has another unknown activity, and this yet-unknown activity rather than the membrane tethering activity drives the morphological maintenance. Similarly, the N-terminal region is important for lipid homeostasis, but the statement in Line 270, "the maintenance of lipid homeostasis by Wag31 is a consequence of its tethering activity" requires additional proof. The authors should tone down these overstatements or provide additional data to support their claims.

      We agree with the reviewer that there exists a possibility for another function of the N-terminal that may contribute to sustaining mycobacterial physiology and survival. We would revise our statements in the paper to reflect the data. Results shown suggest that the tethering activity of the Nterminal region may contribute to mycobacterial morphology and survival. However, additional functions of this region can’t be ruled out. Similarly, the maintenance of lipid homeostasis by Wag31 may be associated with its tethering activity, although other mechanisms could also contribute to this process.

      • Authors suggest that Wag31 acts as a scaffold for the IMD (Fig. 8). However, Meniche et. al. has shown that MurG as well as GlfT2, two well-characterized IMD proteins, do not colocalize with Wag31 (DivIVA) (https://doi.org/10.1073/pnas.1402158111). IMD proteins are always slightly subpolar while Wag31 is located to the tip of the cell. Therefore, the authors' biochemical data cannot be easily reconciled with microscopic observations in the literature. This raises a question regarding the validity of protein-protein interaction shown in Figure 7. Since this pull-down assay was conducted by mixing E. coli lysate expressing Wag31 and Msm lysate expression Wag31 interactors like MurG, it is possible that the interactions are not direct. Authors should interpret their data more cautiously. If authors cannot provide additional data and sufficient justifications, they should avoid proposing a confusing model like Figure 8 that contradicts published observations.

      In the literature, MurG and GlfT2 have been shown to have polar localisation (Freeman et al., 2023; Hayashi et al., 2016; Kado et al., 2023) and two groups have shown slightly sub-polar localisation of MurG (García-Heredia et al., 2021; Meniche et al., 2014). Additionally, (Freeman et al., 2023) showed SepIVA to be a spatio-temporal regulator of MurG. MS/MS analysis of Wag31 immunoprecipitation data yielded both MurG and SepIVA to be interactors of Wag31 (Fig. 3). Given Wag31 also displays polar localisation, it is likely that it associates with the polar MurG. However, since a sub-polar localisation of MurG has also been reported, it is possible that they do not interact directly and another protein mediates their interaction. Based on the above, we will modify the model proposed in Fig. 8.

      We agree that for validation of interaction, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript and propose a model that reflects the results we obtained.

      References:

      Freeman, A. H., Tembiwa, K., Brenner, J. R., Chase, M. R., Fortune, S. M., Morita, Y. S., & Boutte, C. C. (2023). Arginine methylation sites on SepIVA help balance elongation and septation in Mycobacterium smegmatis. Mol Microbiol, 119(2), 208-223. https://doi.org/10.1111/mmi.15006

      Garcia Fernandez, M. I., Ceccarelli, D., & Muscatello, U. (2004). Use of the fluorescent dye 10-N-nonyl acridine orange in quantitative and location assays of cardiolipin: a study on different experimental models. Anal Biochem, 328(2), 174-180. https://doi.org/10.1016/j.ab.2004.01.020

      García-Heredia, A., Kado, T., Sein, C. E., Puffal, J., Osman, S. H., Judd, J., Gray, T. A., Morita, Y. S., & Siegrist, M. S. (2021). Membrane-partitioned cell wall synthesis in mycobacteria. eLife, 10. https://doi.org/10.7554/eLife.60263

      Habibi Arejan, N., Ensinck, D., Diacovich, L., Patel, P. B., Quintanilla, S. Y., Emami Saleh, A., Gramajo, H., & Boutte, C. C. (2022). Polar protein Wag31 both activates and inhibits cell wall metabolism at the poles and septum. Front Microbiol, 13, 1085918. https://doi.org/10.3389/fmicb.2022.1085918

      Hayashi, J. M., Luo, C. Y., Mayfield, J. A., Hsu, T., Fukuda, T., Walfield, A. L., Giffen, S. R., Leszyk, J. D., Baer, C. E., Bennion, O. T., Madduri, A., Shaffer, S. A., Aldridge, B. B., Sassetti, C. M., Sandler, S. J., Kinoshita, T., Moody, D. B., & Morita, Y. S. (2016). Spatially distinct and metabolically active membrane domain in mycobacteria. Proc Natl Acad Sci U S A, 113(19), 5400-5405. https://doi.org/10.1073/pnas.1525165113

      Kado, T., Akbary, Z., Motooka, D., Sparks, I. L., Melzer, E. S., Nakamura, S., Rojas, E. R., Morita, Y. S., & Siegrist, M. S. (2023). A cell wall synthase accelerates plasma membrane partitioning in mycobacteria. eLife, 12, e81924. https://doi.org/10.7554/eLife.81924

      Meniche, X., Otten, R., Siegrist, M. S., Baer, C. E., Murphy, K. C., Bertozzi, C. R., & Sassetti, C. M. (2014). Subpolar addition of new cell wall is directed by DivIVA in mycobacteria. Proc Natl Acad Sci U S A, 111(31), E32433251. https://doi.org/10.1073/pnas.1402158111

      Mileykovskaya, E., & Dowhan, W. (2000). Visualization of phospholipid domains in Escherichia coli by using the cardiolipin-specific fluorescent dye 10-N-nonyl acridine orange. J Bacteriol, 182(4), 1172-1175. https://doi.org/10.1128/JB.182.4.1172-1175.2000

      Petit, J. M., Maftah, A., Ratinaud, M. H., & Julien, R. (1992). 10N-nonyl acridine orange interacts with cardiolipin and allows the quantification of this phospholipid in isolated mitochondria. Eur J Biochem, 209(1), 267273. https://doi.org/10.1111/j.1432-1033.1992.tb17285.x

      Renner, L. D., & Weibel, D. B. (2011). Cardiolipin microdomains localize to negatively curved regions of Escherichia coli membranes. Proc Natl Acad Sci U S A, 108(15), 6264-6269. https://doi.org/10.1073/pnas.1015757108

      Schägger, H. (2006). Tricine-SDS-PAGE. Nat Protoc, 1(1), 16-22. https://doi.org/10.1038/nprot.2006.4

      Xu, W. X., Zhang, L., Mai, J. T., Peng, R. C., Yang, E. Z., Peng, C., & Wang, H. H. (2014). The Wag31 protein interacts with AccA3 and coordinates cell wall lipid permeability and lipophilic drug resistance in Mycobacterium smegmatis. Biochem Biophys Res Commun, 448(3), 255-260. https://doi.org/10.1016/j.bbrc.2014.04.116

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect in levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comment. We have included a clarification for this in the discussion section.

      (2) The pulldown assays results are interesting, but the links are tentative.

      We thank the reviewer for the comment. The interactome of Wag31 was identified through the immunoprecipitation of Flag-tagged Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off 18 and unique peptides5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      Though we agree that the interactions can either be direct or through a third partner, the fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, for validation, we performed pulldown experiments by mixing E. coli lysates expressing HisWag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript.

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      We thank the reviewer for the comment. In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes. Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Recommendations for the authors):

      I recommend the following experiments to strengthen the data presented:

      (1) Include a non-interacting FLAG-tagged protein as a negative control in the pull-down experiment to strengthen this data.

      We thank the reviewer for the comment. As suggested, we have included non-interacting FLAGtagged proteins as negative controls in the pulldown experiment. We chose MmpL4 and MmpS5 which were not found in the Wag31 interactome data. We performed pull-down experiments with both of them and included an interactor of Wag31 i.e. Msm2092 as a positive control. Fig. S3b revised shows E. coli lysate expressing His-Wag31 which was incubated with Msm lysates expressing either FLAG taggedMmpL4 or -MmpS5 or -Msm2092 (Fig. S3c revised). The mixed lysates were pulled down with Cobalt beads that bind to the His-tagged protein and analysed using Western blot analysis by probing with anti-FLAG antibody. The pull down experiments were performed independently twice, every time with Msm2092 as the positive control (Fig. S3d. revised).

      (2) Perform the pull-down experiments using only the Wag31 N-terminus to rule out any role that it may have in the protein-protein interactions.

      We thank the reviewer for the comment. To rule out the possibility of N-terminal of Wag31 in mediating protein-protein interactions, we cloned the N-terminal of Wag31 that comprises the DivIVAdomain in pET28b vector (Fig. 7a revised). Subsequently, the truncated protein, hereafter called Wag31<sub>∆C</sub> flanked by 6X His tags at both the termini was expressed in E. coli and subsequently mixed with Msm lysates expressing interactors of Wag31 (Fig. 7b-c revised). Earlier experiments with Wag31<sub>∆1-60</sub> or Wag31<sub>∆N</sub>  were performed with MurG, SepIVA, Msm2092 and AccA3 (Fig. 7 previous) so we used the same set of interactors to test our hypothesis. Briefly, His-Wag31<sub>∆C</sub>was mixed with Msm lysates expressing either FLAG-MurG, -SepIVA, -Msm2092 or -AccA3 and pull down experiments were performed as described previously. FLAG-MmpS5, a non-interactor of Wag31 was used as a negative control. As shown in Fig. 7d revised, His-Wag31 could bind to all the four interactors whereas His-Wag31<sub>∆C</sub> couldn’t, strengthening the conclusion that interactions of Wag31 with other proteins are mediated by its C-terminal. However, we can’t ignore the possibility of other proteins binding to the Nterminal of Wag31. Unfortunately, due to poor expression/instability of Wag31<sub>∆C</sub> in mycobacterial shuttle vectors, we couldn’t perform a global interactome analysis of Wag31<sub>∆C</sub>.

      Minor comments:

      - Please check the legend of Fig. 1g, it appears to be labelled incorrectly.

      We have checked it. It is correct. From Fig. 1g we are trying to reflect on the percentages of cells of the three strains i.e. Msm+ATc, Δwag31-ATc, and Δwag31+ATc displaying rod, round or bulged morphology.

      - For MS/MS analysis, a GFP control is mentioned but it is not indicated how this was incorporated in the data analysis. This information should be added.

      We have incorporated that in the revised methodology.

      - The information presented in Fig. 3a, e and f could be combined in one table.

      We appreciate the idea of the reviewer but we prefer a pictorial representation of the data. It allows readers to consume the information in parts, make quicker comparisons and understand trends easily.

      - Fig. 4c Wag31K20A appears smaller in size than the wild-type protein - why is this the case? Is this not a single amino acid substitution?

      Though K20A is a single amino acid substitution, it alters the mobility of Wag31 on SDS-PAGE gel. The sequence analysis of the plasmid expressing Wag31<sub>K20A</sub> doesn’t show additional mutations other than the desired K20A. The change in mobility could be due to a change in the conformation of Wag31<sub>K20A</sub> or its ability to bind to SDS or both that modify its mobility under the influence of electric field.

      - Please clarify what is contained in the first panel of fig 4e. compared to what is in the second panel.

      The first panel represents CL-Dil-Liposomes before incubation with Wag31-GFP and the second panel shows CL-Dil-Liposomes after incubation with Wag31-GFP. The third panel shows the mixture as observed in the green channel to investigate the localisation of Wag31-GFP in the liposome-protein mix. Fourth panel shows the merged of second and third.

      - The data in Fig 6d suggests higher levels of CL in the ∆wag31 compared to wild-type - how do the authors reconcile this with the MS data in Fig. 2g showing lower CL levels?

      Fig. 6d represents the distribution of CL localisation in the tested strains of mycobacteria whereas Fig. 2g shows the absolute levels of CL in various strains. We attribute greater confidence on the lipidomics data which suggests down regulation of CL species. The NAO staining and microscopy is merely for studying localization of the CL along the cell, and cannot be used to reliably quantify or equate it to CL levels. The staining using a probe such as NAO is dependent on factors such as hydrophobicity and permeability of the cell wall, which we expect to be severely altered in a Wag31 mutant. Therefore, the increased staining of NAO seen in Wag31 mutant could just be reflective of the increased uptake of the dye rather than absolute levels of CL. The specificity of staining and localization however can be expected to be unaltered.

      Reviewer #3 (Recommendations for the authors):

      Following are suggestions for improving the writing and presentation.

      • Figure 1, the meaning of the yellow arrows present in f and h should be mentioned in the figure legend.

      We have incorporated that in the revised legend. In Fig.1f, the yellow arrowhead represents the bulged pole morphology whereas in Fig. 1h, it indicates intracellular lipid inclusions.

      • Figure 7 legend refers to panels g, h, and i. However, Figure 7 only has panels a-c. The legend lacks a description of panel c.

      We have corrected the typos and the legend.

      • Figure S1, F2-R2 and F3-R3 expected sizes should be stated in the legend of the figure.

      We have updated the legends.

      • Figure S5, is this the same figure as 5e? If so, there is no need for this figure.

      We have removed Fig. S5.

      • Methods need to be written more carefully with enough details. I listed some of the concerns below.

      Detailed methodology was previously provided in the supplementary material and now we have moved it to the materials and methods in the revised manuscript.

      • Line 392, provide more details on western blotting. What is the secondary antibody? What image documentation system was used?

      We have updated the methodology.

      • Line 400, while the methods may be the same as the reference 64, authors should still provide key details such as the way samples were fixed and processed for SEM and TEM.

      We have provided a detailed description of the same in methodology in the revised version.

      • Line 437, how do authors calculate the concentration of liposome to be 10 µM? Do they possibly mean the concentration of phospholipids used to make the liposomes?

      Yes, this is the concentration of total lipids used to make liposomes. 1 μM of Wag31 or its mutants were mixed with 100 nm extruded liposomes containing 10 μm total lipid in separate Eppendorf tubes.

      • Supplemental Line 9, "turns of" should read "turns off".

      We have edited this.

      • Supplemental Line 13, define LHS and RHS.

      LHS or left hand sequence and RHS or right hand sequence refers to the upstream and downstream flanking regions of the gene of interest.

      • Supplemental Line 20, indicate the manufacturer of the microscope and type of the objective lens.

      We have added these details now.

      • Supplemental Line 31, define MeOH, or use a chemical formula like chloroform.

      MeOH is methanol. We have provided a chemical formula in the revised version.

      • Supplemental Line 53, indicate the concentration of trypsin.

      We have included that in the revised version.

      • Supplemental Line 72, g is not a unit. "30,000 g" should be "30,000x g".

      We have revised this in the manuscript.

      • Supplemental Line 114, provide more details on western blotting. What is the manufacturer of antiFLAG antibody? What is the secondary antibody? How was the antibody binding visualized? What image documentation system was used?

      We have provided these details in the revised version.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight on the structural basis for the pharmacology of G protein-coupled receptors.

      We thank the Reviewer for the positive comment on the manuscript and the proposed methods.

      Reviewer #2 (Public review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.

      Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.

      MwSuMD was exploited to address:

      a) binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);

      b) molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDPbound Gs protein;

      c) molecular recognition of the A1-adenosine receptor (A1R) and palmotoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;

      d) the whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron.

      The revised version has improved clarity and rigor compared to the original also thanks to the reduction in the number of complex case studies treated superficially.

      The mwSuMD method is solid and valuable, has wide applicability and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.

      The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.

      While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. Definition of the metrics is a user- and system-dependent process.

      We thank the Reviewer for the positive comment on the revised manuscript and mwSuMD. We agree that the choice of supervised metrics is user- and systemdependent. We aim to improve this aspect in the future with the aid of interpretable machine learning.

      Reviewer #3 (Public review):

      Summary:

      In the present work Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has potential to provide novel insight into GPCR functionality. Example is the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are reproductions of the previously reported structures.

      The method focus of our study (mwSuMD) is an enhancement of the supervised molecular dynamics that allows supervising two metrics at the same time and uses a score, rather than a tabù-like algorithm, for handing the simulation. Further changes are the seeding of parallel short replicas (walkers) rather than a series of short simulations, and the software implementation on different MD engines (e.g. Acemd, OpenMM, NAMD, Gromacs).

      We agree with the Reviewer that experimental validation of the findings would be advisable, in line with any computational prediction. We are positive that future studies from our group employing mwSuMD will inform mutagenesis and BRET-based experiments.

      Reviewer #2 (Recommendations for the authors):

      As for GLP1R, I remain convinced that the 7LCI would have been better as a reference for all simulations than 7LCJ, also because 7LCI holds a slightly more complete ECD.

      We agree that 7LCJ would have been a better starting point than 7LCI for simulations because it presents the stalk region, contrary to 7LCJ. However, we do not think it might have influenced the output because the stalk is the most flexible segment of GLP1R, and any initial conformation is usually not retained during MD simulations.

      Please, correct everywhere the definition of the 6LN2 structure of GPL1R as a ligand-free or apo, because that structure is indeed bound to a negative allosteric modulator docked on the cytosolic end of helix-6

      We thank the reviewer for this precision. The text has been modified accordingly.

      As for the beta2-AR, the "full-length" AlphaFold model downloaded from the GPCRdb is not an intermediate active state because it is very similar to the receptor in the 3SN6 complex with Gs. Please, eliminate the inappropriate and speculative adjective "intermediate".

      We have changed “intermediate” to “not fully active”, which is less speculative since full activation can be achieved only in the presence of the G protein.

      Incidentally, in that model, the C-tail, eliminated by the authors, is completely wrong and occupies the G protein binding site. It is not clear to me the reason why the authors preferred to used an AlphaFold model as an input of simulations rather than a high resolution structural model, e.g. 4LDO. Perhaps, the reason is that all ICL regions, including ICL3, were modeled by AlphaFold even if with low confidence. I disagree with that choice.

      We understand the reviewer’s point of view. Should we have simulated an “equilibrium” receptor-ligand complex, we would have made the same choice. However, the conformational changes occurring during a G protein binding are so consistent that the starting conformation of the receptor becomes almost irrelevant as long as a sensate structure is used.  

      Reviewer #3 (Recommendations for the authors):

      The revised version of the manuscript is more concise, focusing only on two systems. However, the authors have responded superficially to the reviewers' comments, merely deleting sections of text, making minor corrections, or adding small additions to the text. In particular, the authors have not addressed the main critical points raised by both Reviewer 2 and Reviewer 3. 

      For example, the RMSD values for the binding of PF06882961 to GLP-1R remain high, raising doubts about the predictive capabilities of the method, at least for this type of system.

      What is the RMSD of the ligand relative to the experimental pose obtained in the simulations? This value must be included in the text.

      We have added this piece of information about PF06882961 RMSD in the text, which on page 6 now reads “We simulated the binding of PF06882961, reaching an RMSD to its bound conformation in 7LCJ of 3.79 +- 0.83 Å (computed on the second half of the merged trajectory, superimposing on GLP-1R Ca atoms of TMD residues 150 to 390), using multistep supervision on different system metrics (Figure 2) to model the structural hallmark of GLP-1R activation (Video S5, Video S6).”

      Similarly, the activation mechanism of GLP-1R is only partially simulated.

      Furthermore, it is not particularly meaningful to justify the high RMSD values of the SuMD simulations for the binding of Gs to GLP-1R by comparing them with those reported under unbiased MD conditions. "Replica 2, in particular, well reproduced the cryo-EM GLP-1R complex as suggested by RMSDs to 7LCI of 7.59{plus minus}1.58Å, 12.15{plus minus}2.13Å, and 13.73{plus minus}2.24Å for Gα, Gβ, and Gγ respectively. Such values are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with Gs and GLP-149 (Gα = 6.18 {plus minus} 2.40 Å; Gβ = 7.22 {plus minus} 3.12 Å; Gγ = 9.30 {plus minus} 3.65 Å), which indicates overall higher flexibility of Gβ and Gγ compared to Gα, which acts as a sort of fulcrum bound to GLP-1R."

      Without delving into the accuracy of the various calculations, the authors should acknowledge that comparing protein structures with such high RMSD values has no meaningful significance in terms of convergence toward the same three-dimensional structure.

      The text has been edited to accommodate the reviewer’s suggestion and still give the readers the measure of the high flexibility of Gs bound to GLP-1R. It now reads “Such values do not support convergence with the static experimental structure but are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with G<sub>s</sub> and GLP-1 (G<sub>α</sub> = 6.18 ± 2.40 Å; G<sub>b</sub> = 7.22 ± 3.12 Å; G<sub>g</sub> = 9.30 ± 3.65 Å), which indicates overall higher flexibility of G<sub>b</sub> and G<sub>g</sub> compared to G<sub>α</sub>, which acts as a sort of fulcrum bound to GLP-1R.”

      Have the authors simulated the binding of the Gs protein using the experimentally active structure of GLP-1R in complex with the ligand PF06882961 (PDB ID 7LCJ)? Such a simulation would be useful to assess the quality of the binding simulation of Gs to the GLP1R/PF06882961 complex obtained from the previous SuMD.

      We considered performing the Gs binding simulation to the active structure of GLP-1R.

      However, the GLP-1R (and other class B receptors) fully active state, as reported in 7LCJ, depends on the presence of the Gs and can be reached only upon effector coupling. Since it is unlikely that the unbound receptor is already in the fully active state, we reasoned that considering it as a starting point for Gs binding simulations would have been an artifact.

      An example of the insufficient depth of the authors' replies can be seen in their response: "We note that among the suggested references, only Mafi et al report about a simulated G protein (in a pre-formed complex) and none of the work sampled TM6 rotation without input of energy."

      This statement is inaccurate. For instance, D'Amore et al. (Chem 2024, doi: 10.1016/j.chempr.2024.08.004) simulated Gs coupling to A2A as well as TM6 rotation, as did Maria-Solano and Choi (eLife 2023, doi: 10.7554/eLife.90773.1). The former employed path collective variables metadynamics, which is not cited in the introduction or the discussion, despite its relevance to the methodologies mentioned.

      Respectfully, our previous reply is correct, as all of the mentioned articles used enhanced (energy-biased) approaches, so the claim “none of the work sampled TM6 rotation without input of energy” stands. The reference to D’Amore et al. (published after the previous round of reviews of this manuscript) has been added to the introduction; we thank the reviewer for pointing it out. 

      Additionally, SuMD employs a tabu algorithm that applies geometric supervision to the simulation, serving as an alternative approach to enhancing sampling compared to the "input of energy" techniques as called by the authors. A fair discussion should clearly acknowledge this aspect of the SuMD methodology.

      We have now specified in the Methods that a tabù-like algorithm is part of SuMD, which, despite being the parent technique of mwSuMD, is not the focus of the present work. We provide extended references for readers interested in SuMD. mwSuMD, on the other hand, does not use a tabù-like algorithm but rather a continuative approach based on a score to select the best walker for each batch, as described in the Methods.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of taste cells of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi" (in terms of human taste); these kokumi stimuli appear to enhance other canonical tastes, increasing what are essentially hedonic attributes of other stimuli. The mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model, and comes to a similar conclusion, albeit with some small differences between the two rodent species.

      Strengths:

      The data show effects of ornithine on taste/intake in laboratory rats: In two-bottle and briefer intake tests, adding ornithine results in higher intake of most, but all not all stimuli tested. Bilateral chorda tympani (CT) nerve cuts or the addition of GPRC6A antagonists decreased or eliminated these effects. Ornithine also evoked responses by itself in the CT nerve, but mainly at higher concentrations; at lower concentrations it potentiated the response to monosodium glutamate. Finally, immunocytochemistry of taste cell expression indicated that GPRC6A was expressed predominantly in the anterior tongue, and co-localized (to a small extent) with only IP3R3, indicative of expression in a subset of type II taste receptor cells.

      Weaknesses:

      As the authors are aware, it is difficult to assess a complex human taste with complex attributes, such as kokumi, in an animal model. In these experiments they attempt to uncover mechanistic insights about how ornithine potentiates other stimuli by using a variety of established experimental approaches in rats. They partially succeed by finding evidence that GPRC6A may mediate effects of ornithine when it is used at lower concentrations. In the revision they have scaled back their interpretations accordingly. A supplementary experiment measuring certain aspects of the effects of ornithine added to Miso soup in human subjects is included for the express purpose of establishing that the kokumi sensation of a complex solution is enhanced by ornithine; however, they do not use any such complex solutions in the rat studies. Moreover, the sample size of the human experiment is (still) small - it really doesn't belong in the same manuscript with the rat studies.

      Despite the reviewer’s suggestion, we would like to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, which is then followed by basic animal experiments to investigate the underlying mechanisms of kokumi in humans.

      We did not present the additive effects of ornithine on miso soup in the present rat study because our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26) already confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred to plain miso soup by mice.

      Furthermore, we believe that our sample size (n = 22) is comparable to those employed in other studies. For example, the representative kokumi studies by Ohsu et al. (Ref. #9), Ueda et al. (Ref. #10), Shibata et al. (Ref. #20), Dunkel et al. (Ref. #37), and Yang et al. (Ref. #44) used sample sizes of 20, 19, 17, 9, and 15, respectively.

      Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors provide compelling evidence that ornithine enhances the palatability of several chemical stimuli (i.e., IMP, MSG, MPG, Intralipos, sucrose, NaCl, quinine). Ornithine also increases CT nerve responses to MSG. Additionally, the authors provide evidence that the effects of ornithine are mediated by GPRC6A, a G-protein-coupled receptor family C group 6 subtype A, and that this receptor is expressed primarily in fungiform taste buds. Taken together, these results indicate that ornithine enhances the palatability of multiple taste stimuli in rats and that the enhancement is mediated, at least in part, within fungiform taste buds. This is an important finding that could stand on its own. The question of whether ornithine produces these effects by eliciting kokumi-like perceptions (see below) should be presented as speculation in the Discussion section.

      Weaknesses:

      I am still unconvinced that the measurements in rats reflect the "kokumi" taste percept described in humans. The authors conducted long-term preference tests, 10-min avidity tests and whole chorda tympani (CT) nerve recordings. None of these procedures specifically model features of "kokumi" perception in humans, which (according to the authors) include increasing "intensity of whole complex tastes (rich flavor with complex tastes), mouthfulness (spread of taste and flavor throughout the oral cavity), and persistence of taste (lingering flavor)." While it may be possible to develop behavioral assays in rats (or mice) that effectively model kokumi taste perception in humans, the authors have not made any effort to do so. As a result, I do not think that the rat data provide support for the main conclusion of the study--that "ornithine is a kokumi substance and GPRC6A is a novel kokumi receptor."

      Kokumi can be assessed in humans, as demonstrated by the enhanced kokumi perception observed when miso soup is supplemented with ornithine (Fig. S1). Currently, we do not have a method to measure the same kokumi perception in animals. However, in the two-bottle preference test, our previous companion paper (Fig. 1B in Mizuta et al. 2021, Ref. #26) confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred over plain miso soup by mice.

      Of the three attributes of kokumi perception in humans, the “intensity of whole complex tastes (rich flavor with complex tastes)” was partly demonstrated in the present rat study. In contrast, “mouthfulness (the spread of taste and flavor throughout the oral cavity)” could not be directly detected in animals and had to be inferred in the Discussion. “Persistence of taste (lingering flavor)” was evident at least in the chorda tympani responses; however, because the tongue was rinsed 30 seconds after the onset of stimulation, the duration of the response was not fully recorded.

      It is well accepted in sensory physiology that the stronger the stimulus, the larger the tonic response—and consequently, the longer it takes for the response to return to baseline. For example, Kawasaki et al. (2016, Ref. #45) clearly showed that the duration of sensation increased proportionally with the concentration of MSG, lactic acid, and NaCl in human sensory tests. The essence of this explanation has been incorporated into the Discussion (p. 12).

      Why are the authors hypothesizing that the primary impacts of ornithine are on the peripheral taste system? While the CT recordings provide support for peripheral taste enhancement, they do not rule out the possibility of additional central enhancement. Indeed, based on the definition of human kokumi described above, it is likely that the effects of kokumi stimuli in humans are mediated at least in part by the central flavor system.

      We agree with the reviewer’s comment. Our CT recordings indicate that the effects of kokumi stimuli on taste enhancement occur primarily at the peripheral taste organs. The resulting sensory signals are then transmitted to the brain, where they are processed by the central gustatory and flavor systems, ultimately giving rise to kokumi attributes. This central involvement in kokumi perception is discussed on page 12. Although kokumi substances exert their effects at low concentrations—levels at which the substance itself (e.g., ornithine) does not become more favorable or (in the case of γ-Glu-Val-Gly) exhibits no distinct taste—we cannot rule out the possibility that even faint taste signals from these substances are transmitted to the brain and interact with other taste modalities.

      The authors include (in the supplemental data section) a pilot study that examined the impact of ornithine on variety of subjective measures of flavor perception in humans. The presence of this pilot study within the larger rat study does not really mice sense. While I agree with the authors that there is value in conducting parallel tests in both humans and rodents, I think that this can only be done effectively when the measurements in both species are the same. For this reason, I recommend that the human data be published in a separate article.

      Despite the reviewer’s suggestion, we intend to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, and then follow up with basic animal experiments to investigate the potential underlying mechanisms of kokumi in humans.

      In our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26), we confirmed with statistical significance (P < 0.001) that mice preferred miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) over plain miso soup. However, as explained in our response to Reviewer #2’s first concern (in the Public review), it is difficult to measure two of the three kokumi attributes—aside from the “intensity of whole complex tastes (rich flavor with complex tastes)”—in animal models.

      The authors indicated on several occasions (e.g., see Abstract) that ornithine produced "synergistic" effects on the CT nerve response to chemical stimuli. "Synergy" is used to describe a situation where two stimuli produce an effect that is greater than the sum of the response to each stimulus alone (i.e., 2 + 2 = 5). As far as I can tell, the CT recordings in Fig. 3 do not reflect a synergism.

      We appreciate your comments regarding the definition of synergy. In Fig. 5 (not Fig. 3), please note the difference in the scaling of the ordinate between Fig. 5D (ornithine responses) and Fig. 5E (MSG responses). When both responses are presented on the same scale, it becomes evident that the response to 1 mM ornithine is negligibly small compared to the MSG response, which clearly indicates that the response to the mixture of MSG and 1 mM ornithine exceeds the sum of the individual responses to MSG and 1 mM ornithine. Therefore, we have described the effect as “synergistic” rather than “additive.” The same observation applies to the mice experiments in our previous companion paper (Fig. 8 in Mizuta et al. 2021, Ref. #26), where synergistic effects are similarly demonstrated by graphical representation. We have also added the following sentence to the legend of Fig. 5:

      “Note the different scaling of the ordinate in (D) and (E).”

      Reviewer #3 (Public review):

      Summary:

      In this study the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste. The researchers confirmed in rats their previous work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants including: inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl; salt); citric acid (sour) and quinine hydrochloride (bitter). Robust effects of ornithine were observed in the cases of IMP, MSG, MPG and sucrose; and little or no effects were observed in the cases of sodium chloride, citric acid; quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. Inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify a role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). These alternatives are appropriately discussed and, taken together, the experimental results favor the authors' interpretation that C6A mediates the Ornithine responses. The authors provide preliminary data in Suppl. 3 for the possibility of co-expression of C6A with the CaSR.

      Weaknesses:

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9).

      Ornithine and umami substances interact to produce synergistic effects in both directions—ornithine enhances responses to umami substances, and vice versa. These effects may depend on the concentrations used, as described in the Discussion (pp. 9–10). Further studies are required to clarify the precise nature of this interaction.

      One issue that is not addressed, and could be usefully addressed in the Discussion, relates to the potential effects of kokumi substances on the threshold concentrations of key tastants such as glutamate. Thus, an extension of taste distribution to additional areas of the mouth (previously referred to as 'mouthfulness') and persistence of taste/flavor responses (previously referred to as 'continuity') could arise from a reduction in the threshold concentrations of umami and other substances that evoke taste responses.

      Thank you for this important suggestion. If ornithine reduces the threshold concentrations of tastants—including glutamate—and enhances their suprathreshold responses, then adding ornithine may activate additional taste cells. This effect could explain kokumi attributes such as an “extension of taste distribution” and possibly the “persistence of responses.” As shown in Fig. 2, the lowest concentrations used for each taste stimulus are near or below the thresholds, which indicates that threshold concentrations are reduced—especially for MSG and MPG. We have incorporated this possibility into the Discussion as follows (p.12):

      “Kokumi substances may reduce the threshold concentrations as well as they increase the suprathreshold responses of tastants. Once the threshold concentrations are lowered, additional taste cells in the oral cavity become activated, and this information is transmitted to the brain. As a result, the brain perceives this input as coming from a wider area of the mouth.”

      The status of one of the compounds used as an inhibitor of C6A, the gallate derivative EGCG, as a potential inhibitor of the CaSR or T1R1/T1R3 is unknown. It would have been helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response.

      Thank you for this important comment. We attempted to identify a specific inhibitor of CaSR. Although we considered using NPS-2143—a commonly used CaSR inhibitor—it is known to also inhibit GPRC6A. We agree that using a specific CaSR inhibitor would be beneficial and plan to pursue this in future studies.

      It would have been helpful to include a positive control kokumi substance in the two bottle preference experiment (e.g., one of the known gamma glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

      We agree with this comment. In retrospect, it may have been advantageous to directly compare the potencies of CaSR and GPRC6A agonists in enhancing taste preferences—and to evaluate the sensitivity of these preferences to CaSR and GPRC6A antagonists. However, we did not include γ-Glu-Val-Gly in the present study because we have already reported its supplementation effects on the ingestion of basic taste solutions in rats using the same methodology in a separate paper (Yamamoto and Mizuta, 2022, Ref. #25). The results from both studies are compared in the Discussion (p. 11).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major:

      I am not convinced by the Author's arguments for including the human data. I appreciate their efforts in adding a few (5) subjects and improving the description, but it still feels like it is shoehorned into this paper, and would be better published as a different manuscript.

      This human study is short, but it is complete rather than preliminary. The rationale for us to include the human data as supplementary information is shown in responses to the reviewer’s Public review.

      Minor concerns:

      Page 3 paragraph 1: Suggest "contributing to palatability".

      Thank you for this suggestion. We have rewritten the text as follows:

      “…, the brain further processes these sensations to evoke emotional responses, contributing to palatability or unpleasantness”.

      Page 4 paragraph 2: The text still assumes that "kokumi" is a meaningful descriptor for what rodents experience. Re-wording the following sentence like this could help:

      "Neuroscientific studies in mice and rats provide evidence that gluthione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi taste as experienced by humans. However, to our..."

      Or something similar.

      Thank you for this suggestion. We have rewritten the sentence according to your suggestion as follows:

      "Neuroscientific studies (23,25,30) in mice and rats provide evidence that glutathione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi as experienced by humans”.

      Page 7 paragraph 1 - put the concentrations of Calindol and EGCG used (in the physiology exps) in the text.

      We have added the concentrations: “300 µM calindol and 100 µM EGCG”.

      Reviewer #2 (Recommendations for the authors):

      I have included all of my recommendations in the public review section.

      Reviewer #3 (Recommendations for the authors):

      Although the definitions of 'thickness', 'mouthfulness' and 'continuity' have been revised very helpfully in the Introduction, 'mouthfulness' reappears at other points in the MS e.g., Page 4, Results, Line 3; Page 9, Line 3. It is best replaced by the new definition in these other locations too.

      We wish to clarify that our revised text stated, “…to clarify that kokumi attributes are inherently gustatory, in the present study we use the terms ‘intensity of whole complex tastes (rich flavor with complex tastes)’ instead of ‘thickness,’ ‘mouthfulness (spread of taste and flavor throughout the oral cavity)’ instead of ‘continuity,’ and ‘persistence of taste (lingering flavor)’ instead of ‘continuity.’” The term “mouthfulness” was retained in our text, though we provided a more specific explanation. In the re-revised version, we have added “(spread of taste in the oral cavity)” immediately after “mouthfulness.”

      I doubt that many scientific readers will be familliar with the term 'intragemmal nerve fibres' (Page 8, Line 4). It is used appropriately but it would be helpful to briefly define/explain it.

      We have added an explanation as follows:

      “… intragemmal nerve fibers, which are nerve processes that extend directly into the structure of the taste bud to transmit taste signals from taste cells to the brain.”

      I previously pointed out the overlap between the CaSR's amino acid (AA) and gamma-glutamyl-peptide binding site. I was surprised by the authors' response which appeared to miss the point being made. It was based on the impacts of selected mutations in the receptor's Venus FlyTrap domain (Broadhead JBC 2011) on the responses to AAs and glutathione analogs. The significantly more active analog, S-methylglutathione is of additional interest because, like glutathione itself, it is present in mammalian body fluids. My apologies to the authors for not more carefully explaining this point.

      Thank you for this comment. Both CaSR and GPRC6A are recognized as broad-spectrum amino acid sensors; however, their agonist profiles differ. Aromatic amino acids preferentially activate CaSR, whereas basic amino acids tend to activate GPRC6A. For instance, among basic amino acids, ornithine is a potent and specific activator of GPRC6A, while γ-Glu-Val-Gly in addition to amino acids is a high-potency activator of CaSR. It remains unclear how effectively ornithine activates CaSR and whether γ-glutamyl peptides also activate GPRC6A. These questions should be addressed in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study uses consensus-independent component analysis to highlight transcriptional components (TC) in high-grade serous ovarian cancers (HGSOC). The study presents a convincing preliminary finding by identifying a TC linked to synaptic signaling that is associated with shorter overall survival in HGSOC patients, highlighting the potential role of neuronal interactions in the tumour microenvironment. This finding is corroborated by comparing spatially resolved transcriptomics in a small-scale study; a weakness is in being descriptive, non-mechanistic, and requiring experimental validation.”

      We sincerely thank the editors for their valuable and constructive feedback. We are grateful for the recognition of our findings and the importance of identifying transcriptional components in high-grade serous ovarian cancers.

      We acknowledge the editors’ observation regarding the descriptive nature of our study and its limited mechanistic depth. We agree that additional experimental validation would further strengthen our conclusions. We are planning and executing the experiments for a future study to provide mechanistic insights into the associations found in this study. In addition, recent reviews focused on the emerging field of cancer neuroscience emphasize the early stages the field is in, specifically in terms of a mechanistic understanding of the contributions of tumor-infiltrating nerves in tumor initiation and progression (Amit et al., 2024; Hwang et al., 2024). Nonetheless, we wish to emphasize that emerging mechanistic preclinical studies have demonstrated the influence of tumour-infiltrating nerves on disease progression (Allen et al., 2018; Balood et al., 2022; Darragh et al., 2024; Globig et al., 2023; Jin et al., 2022; Restaino et al., 2023; Zahalka et al., 2017). Several of these studies include contributions from our co-authors and feature in vitro and in vivo research on head and neck squamous cell carcinoma as well as high-grade serous ovarian carcinoma samples. This study further strengthens the preclinical work by showing in patient data, the potential relevance of neuronal signaling on disease outcome.

      For instance, Restiano et al. (2023) demonstrated that substance P, released from tumour-infiltrating nociceptors, potentiates MAP kinase signaling in cancer cells, thereby driving disease progression. Crucially, this effect was shown to be reversible in vivo by blocking the substance P receptor (Restaino et al., 2023). These findings offer compelling evidence of the role of tumour innervation in cancer biology.

      Our current study in tumor samples of patients with high-grade serous ovarian cancer identifies a transcriptional component that is enriched for genes for which the protein is located in the synapse. We believe that the previously published mechanistic insights support our findings and suggest that this transcriptional component could serve as a valuable screening tool to identify innervated tumours based on bulk transcriptomes. Clinically, this information is highly relevant, as patients with innervated tumours may benefit from alternate therapeutic strategies targeting these innervations.

      Reviewer #1 (Public review)

      This manuscript explores the transcriptional landscape of high-grade serous ovarian cancer (HGSOC) using consensus-independent component analysis (c-ICA) to identify transcriptional components (TCs) associated with patient outcomes. The study analyzes 678 HGSOC transcriptomes, supplemented with 447 transcriptomes from other ovarian cancer types and noncancerous tissues. By identifying 374 TCs, the authors aim to uncover subtle transcriptional patterns that could serve as novel drug targets. Notably, a transcriptional component linked to synaptic signaling was associated with shorter overall survival (OS) in patients, suggesting a potential role for neuronal interactions in the tumour microenvironment. Given notable weaknesses like lack of validation cohort or validation using another platform (other than the 11 samples with ST), the data is considered highly descriptive and preliminary.

      Strengths:

      (1) Innovative Methodology:

      The use of c-ICA to dissect bulk transcriptomes into independent components is a novel approach that allows for the identification of subtle transcriptional patterns that may be overshadowed in traditional analyses.

      We thank the reviewer for recognizing the strengths and novelty of our study. We appreciate the positive feedback on using consensus-independent component analysis (c-ICA) to decompose bulk transcriptomes, which allowed us to detect subtle transcriptional signals often overlooked in traditional analyses.

      (2) Comprehensive Data Integration:

      The study integrates a large dataset from multiple public repositories, enhancing the robustness of the findings. The inclusion of spatially resolved transcriptomes adds a valuable dimension to the analysis.

      We thank the reviewer for recognizing the robustness of our study through comprehensive data integration. We appreciate the acknowledgment of our efforts to leverage a large, multi-source dataset, as well as the additional insights gained from spatially resolved transcriptomes. We consider this integrative approach enhances the depth of our analysis and contributes to a more nuanced understanding of the tumour microenvironment.

      (3) Clinical Relevance:

      The identification of a synaptic signaling-related TC associated with poor prognosis highlights a potential new avenue for therapeutic intervention, emphasizing the role of the tumour microenvironment in cancer progression.

      We appreciate the recognition of the clinical implications of our findings. The identification of a synaptic signaling-related transcriptional component associated with poor prognosis underscores the potential for novel therapeutic targets within the tumour microenvironment. We agree that this insight could open new avenues for intervention and further highlights the role of neuronal interactions in cancer progression.

      Weaknesses:

      (1) Mechanistic Insights:

      While the study identifies TCs associated with survival, it provides limited mechanistic insights into how these components influence cancer progression. Further experimental validation is necessary to elucidate the underlying biological processes.

      We acknowledge the point regarding the limited mechanistic insights provided in our study. We agree that further experimental validation would significantly enhance our understanding of how the biological processes captured by these transcriptional components influence cancer progression. We are planning and executing the experiments for  a future study to provide mechanistic insights into the associations found in this study.

      Our analyses were performed on publicly available bulk and spatial resolved expression profiles. To investigate the mechanistic insights in future studies, we plan to integrate spatial transcriptomic data with immunohistochemical analysis of the same tumour samples to validate our findings. Additionally, we have initiated efforts to set up in vitro co-cultures of neurons and ovarian cancer cells. These co-cultures will enable us to investigate how synaptic signaling impacts ovarian cancer cell behavior.

      (2) Generalizability:

      The findings are primarily based on transcriptomic data from HGSOC. It remains unclear how these results apply to other subtypes of ovarian cancer or different cancer types.

      To respond to this remark, we utilized survival data from Bolton et al. (2022) and TCGA to investigate associations between TC activity scores and overall survival of patients with ovarian clear cell carcinoma, the second most common subtype of epithelial ovarian cancer, and  other cancer types respectively. However, we acknowledge the limitations of TCGA survival data, as highlighted in the referenced article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8726696/). Additionally, as shown in Figure 5, we provided evidence of TC121 activity across various cancer types, suggesting broader relevance. For the results of the analyses mentioned above, please refer to our response to remark 1.3 of the recommendation section (page 4).

      (3) Innovative Methodology:

      Requires more validation using different platforms (IHC) to validate the performance of this bulk-derived data. Also, the lack of control over data quality is a concern.

      We acknowledge the value of validating our results with alternative platforms such as IHC. We are planning and executing the experiments for a future study to provide mechanistic insights into the associations found in this study.

      We implemented regarding data quality control, the following measures to ensure the reliability of our analysis:

      Bulk Transcriptional Profiles: To assess data quality, we conducted principal component analysis (PCA) on the sample Pearson product-moment correlation matrix. The first principal component (PCqc), which explains approximately 80-90% of the variance, was used to distinguish technical variability from biological signals (Bhattacharya et al., 2020). Samples with a correlation coefficient below 0.8 relative to PCqc were identified as outliers and excluded. Additionally, MD5 hash values were generated for each CEL file to identify and remove duplicate samples. Expression values were standardized to a mean of zero and a variance of one for each gene to minimize probeset- or gene-specific variability across datasets (GEO, CCLE, GDSC, and TCGA).

      Spatial Transcriptional Profiles: PCA was also applied to spatial transcriptomic data for quality control. Only samples with consistent loading factor signs for the first principal component across all individual spot profiles were retained. Samples failing this criterion were excluded from further analyses.

      (4) Clinical Application:

      Although the study suggests potential drug targets, the translation of these findings into clinical practice is not addressed. Probably given the lack of some QA/QC procedures it'll be hard to translate these results. Future studies should focus on validating these targets in clinical settings.”

      Regarding clinical applications, we acknowledge the importance of further exploring strategies targeting synaptic signaling and neurotransmitter release in the tumour microenvironment (TME). As partially discussed in the first version of the manuscript, drugs such as ifenprodil and lamotrigine—commonly used to treat neuronal disorders—can block glutamate release, thereby inhibiting subsequent synaptic signaling. Additionally, the vesicular monoamine transporter (VMAT) inhibitor reserpine blocks the formation of synaptic vesicles (Reid et al., 2013; Williams et al., 2001). Previous in vitro studies with HGSOC cell lines demonstrated that ifenprodil significantly reduced cancer cell proliferation, while reserpine triggered apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). The findings highlight the potential of such approaches to disrupt synaptic neurotransmission in the TME.

      To address potential translation of our findings into clinical practice more comprehensively, we have included additional details in the manuscript:

      Section discussion, page 16, lines 338-341:

      “This interaction can be targeted with pan-TRK inhibitors such as entrectinib and larotrectinib. Both drugs are showing promising results in multiple phase II trials, including ovarian cancer and breast cancer patients. Furthermore, a TRKB-specific inhibitor was developed (ANA-12), but has not been subjected to any clinical trials in cancer so far (Ardini et al., 2016; Burris et al., 2015; Drilon et al., 2018, 2017).”

      On page 17, lines 361-374:

      “Strategies to disrupt neuronal signaling and neurotransmitter release in neurons target key elements of excitatory neurotransmission, such as calcium flux and vesicle formation. Drugs like ifenprodil and lamotrigine, commonly used to treat neuronal disorders, block glutamate release and subsequent neuronal signaling. Additionally, the vesicular monoamine transporter (VMAT) inhibitor reserpine prevents synaptic vesicle formation (Reid et al., 2013; Williams, 2001). In vitro studies with HGSOC cell lines have demonstrated that ifenprodil significantly inhibits tumour proliferation, while reserpine induces apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). These approaches hold promise for inhibiting neuronal signaling and interactions in the TME.”

      Reviewer #2 (Public review):

      Summary:

      Consensus-independent component analysis and closely related methods have previously been used to reveal components of transcriptomic data that are not captured by principal component or gene-gene coexpression analyses.

      Here, the authors asked whether applying consensus-independent component analysis (c-ICA) to published high-grade serous ovarian cancer (HGSOC) microarray-based transcriptomes would reveal subtle transcriptional patterns that are not captured by existing molecular omics classifications of HGSOC.

      Statistical associations of these (hitherto masked) transcriptional components with prognostic outcomes in HGSOC could lead to additional insights into underlying mechanisms and, coupled with corroborating evidence from spatial transcriptomics, are proposed for further investigation.

      This approach is complementary to existing transcriptomics classifications of HGSOC.

      The authors have previously applied the same approach in colorectal carcinoma (Knapen et al. (2024) Commun. Med).

      Strengths:

      (1) Overall, this study describes a solid data-driven description of c-ICA-derived transcriptional components that the authors identified in HGSOC microarray transcriptomics data, supported by detailed methods and supplementary documentation.

      We thank the reviewer for acknowledging the strength of our data-driven approach and the use of consensus-independent component analysis (c-ICA) to identify transcriptional components within HGSOC microarray data. We aimed to provide comprehensive methodological detail and supplementary documentation to support the reproducibility and robustness of our findings. We believe this approach allows for the identification of subtle transcriptional signals that might have been overlooked by traditional analysis methods.

      (2) The biological interpretation of transcriptional components is convincing based on (data-driven) permutation analysis and a suite of analyses of association with copy-number, gene sets, and prognostic outcomes.

      We appreciate the positive feedback on the biological interpretation of our transcriptional components. We are pleased that our approach, which includes data-driven permutation testing and analyses of associations with copy-number alterations, gene sets, and prognostic outcomes, was found to be convincing. These analyses were integral to enhancing our findings’ robustness and biological relevance.

      (3) The resulting annotated transcriptional components have been made available in a searchable online format.

      Thank you for this important positive remark.

      (4) For the highlighted transcriptional component which has been annotated as related to synaptic signalling, the detection of the transcriptional component among 11 published spatial transcriptomics samples from ovarian cancers appears to support this preliminary finding and requires further mechanistic follow-up.

      Thank you for acknowledging the accessibility of our annotated transcriptional components. We prioritized making these data available in a searchable online format to facilitate further research and enable the community to explore and validate our findings.

      Weaknesses:

      (1) This study has not explicitly compared the c-ICA transcriptional components to the existing reported transcriptional landscape and classifications for ovarian cancers (e.g. Smith et al Nat Comms 2023; TCGA Nature 2011; Engqvist et al Sci Rep 2020) which would enable a further assessment of the additional contribution of c-ICA - whether the cICA approach captured entirely complementary components, or whether some components are correlated with the existing reported ovarian transcriptomic classifications.

      We acknowledge the reviewer’s insightful suggestion to compare our c-ICA-derived transcriptional components with previously reported ovarian cancer classifications, such as those from Smith et al. (2023), TCGA (2011), and Engqvist et al. (2020). To address this, we incorporated analyses comparing the activity scores of our transcriptional components with these published landscapes and classifications, particularly focusing on any associations with overall survival. Additionally, we evaluated correlations between gene signatures from a subset of these studies and our identified TCs, enhancing our understanding of the unique contributions of the c-ICA approach. Please refer to our response to remark 10 for the results of these analyses.

      (2) Here, the authors primarily interpret the c-ICA transcriptional components as a deconvolution of bulk transcriptomics due to the presence of cells from tumour cells and the tumour microenvironment.

      However, c-ICA is not explicitly a deconvolution method with respect to cell types: the transcriptional components do not necessarily correspond to distinct cell types, and may reflect differential dysregulation within a cell type. This application of c-ICA for the purpose of data-driven deconvolution of cell populations is distinct from other deconvolution methods that explicitly use a prior cell signature matrix.”

      We acknowledge that c-ICA, unlike traditional deconvolution methods, is not specifically designed for cell-type deconvolution and does not rely on a predefined cell signature matrix. While we explored the transcriptional components in the context of tumour and microenvironmental interactions, we agree that these components may not correspond directly to distinct cell types but rather reflect complex patterns of dysregulation, potentially within individual cell populations.

      Our goal with c-ICA was to uncover hidden transcriptional patterns possibly influenced by cellular heterogeneity. However, we recognize these patterns may also arise from regulatory processes within a single cell type. To investigate further, we used single-cell transcriptional data (~60,000 cell-types annotated profiles from GSE158722) and projected our transcriptional components onto these profiles to obtain activity scores, allowing us to assess each TC’s behavior across diverse cellular contexts after removing the first principal component to minimize background effects. Please refer to our response to remark 2.2 in the recommendations to the authors (page 14) for the results of this analysis.

      References

      Allen JK, Armaiz-Pena GN, Nagaraja AS, Sadaoui NC, Ortiz T, Dood R, Ozcan M, Herder DM, Haemerrle M, Gharpure KM, Rupaimoole R, Previs R, Wu SY, Pradeep S, Xu X, Han HD, Zand B, Dalton HJ, Taylor M, Hu W, Bottsford-Miller J, Moreno-Smith M, Kang Y, Mangala LS, Rodriguez-Aguayo C, Sehgal V, Spaeth EL, Ram PT, Wong ST, Marini FC, Lopez-Berestein G, Cole SW, Lutgendorf SK, diBiasi M, Sood AK. 2018. Sustained adrenergic signaling promotes intratumoral innervation through BDNF induction. Cancer Res 78 (12):3233-3242.

      Ardini E, Menichincheri M, Banfi P, Bosotti R, Ponti CD, Pulci R, Ballinari D, Ciomei M, Texido G, Degrassi A, Avanzi N, Amboldi N, Saccardo MB, Casero D, Orsini P, Bandiera T, Mologni L, Anderson D, Wei G, Harris J, Vernier J-M, Li G, Felder E, Donati D, Isacchi A, Pesenti E, Magnaghi P, Galvani A. 2016. Entrectinib, a Pan–TRK, ROS1, and ALK Inhibitor with activity in multiple molecularly defined cancer Indications. Mol Cancer Ther 15:628–639.

      Balood M, Ahmadi M, Eichwald T, Ahmadi A, Majdoubi A, Roversi Karine, Roversi Katiane, Lucido CT, Restaino AC, Huang S, Ji L, Huang K-C, Semerena E, Thomas SC, Trevino AE, Merrison H, Parrin A, Doyle B, Vermeer DW, Spanos WC, Williamson CS, Seehus CR, Foster SL, Dai H, Shu CJ, Rangachari M, Thibodeau J, Rincon SVD, Drapkin R, Rafei M, Ghasemlou N, Vermeer PD, Woolf CJ, Talbot S. 2022. Nociceptor neurons affect cancer immunosurveillance. Nature 611:405–412.

      Bhattacharya A, Bense RD, Urzúa-Traslaviña CG, Vries EGE de, Vugt MATM van, Fehrmann RSN. 2020. Transcriptional effects of copy number alterations in a large set of human cancers. Nat Commun 11:715.

      Burris HA, Shaw AT, Bauer TM, Farago AF, Doebele RC, Smith S, Nanda N, Cruickshank S, Low JA, Brose MS. 2015. Abstract 4529: Pharmacokinetics (PK) of LOXO-101 during the first-in-human Phase I study in patients with advanced solid tumors: Interim update. Cancer Res 75:4529–4529.

    1. Author response:

      We thank the reviewers for their evaluation, for helpful suggestions to improve clarity and accuracy, and for their positive reception of the manuscript. We will incorporate their suggestions in a revised manuscript. Here, we respond to their major comments. 

      The reviewers suggest that a molecular study of Hofstenia’s reproductive systems would be beneficial, as would mechanistic explanations for its unusual reproductive behavior. We agree with the reviewers that both of these would be interesting avenues, although we think this is outside the scope of this current manuscript. This manuscript studies growth and reproductive dynamics in acoels, and establishes a foundation to study its underlying molecular, developmental, and physiological machinery. 

      Our previous molecular work, using scRNAseq and FISH, identified several germline markers. Here, we show that two of them are specific markers of testes and ovaries, respectively. This, together, with our new anatomical data, allows us to identify the expression domains of most of these other markers more clearly. Some markers may be expressed in a presumptive common germline that eventually splits into an anterior male germline and posterior female germline. We agree with the reviewers that understanding the dynamics of germline differentiation and its molecular genetic underpinnings would be very interesting, and we hope to address this in future work. 

      As the reviewers note, we do not understand how sperm is stored, how the worm’s own sperm can travel to its ovaries to enable selfing, or how eggs in the ovaries travel within the body. We agree with the reviewers that understanding these processes would be very interesting. Our histological and molecular work so far has been unable to find tube-like structures or other cavities for storage and transport. Potentially, cells could move within the parenchyma. Explaining these events will require substantial effort (including mechanistic studies of cell behavior and ultrastructural studies that the reviewers suggest), and we hope to do this in future work. 

      We agree with Reviewer 1 that it is interesting that Piwi-1 expression is only observed in the ovaries and not in the testes - unusual given its broad germline expression in many taxa. Although there are several possible explanations for this finding (for eg. Piwi-1 could be expressed at low levels in male germline, perhaps other Piwi proteins are expressed in male germline, or Piwi may play roles in male germline progenitors that are not co-located with maturing sperm, etc), we do not currently know why this is so, and we will discuss these possibilities in our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors report the role of a novel gene Aff3ir-ORF2 in flow-induced atherosclerosis. They show that the gene is anti-inflammatory in nature. It inhibits the IRF5-mediated athero-progression by inhibiting the causal factor (IRF5). Furthermore, the authors show a significant connection between shear stress and Aff3ir-ORF2 and its connection to IRF5 mediated athero-progression in different established mice models which further validates the ex vivo findings.

      Strengths:

      (1) An adequate number of replicates were used for this study.

      (2) Both in vitro and in vivo validation was done.

      (3) The figures are well presented.

      (4) In vivo causality is checked with cleverly designed experiments.

      We thank you for your positive remarks.

      Weaknesses:

      (1) Inflammatory proteins must be measured with standard methods e.g ELISA as mRNA level and protein level does not always correlate.

      Thanks. We have followed your advice and performed ELISA experiments to measure the concentrations of inflammatory cytokines, including IL-6 and IL-1β. The newly acquired results have been included in Figure 2E (Line 160-163) in the revised manuscript.

      (2) RNA seq analysis has to be done very carefully. How does the euclidean distance correlate with the differential expression of genes. Do they represent the neighborhood?

      If they do how does this correlation affect the conclusion of the paper?

      We thank the reviewer for this professional comments and apologize for the confusion. The heatmap using Euclidean distance was generated based on the expression levels of all differentially expressed genes (calculated with deseq2). Since its interpretation overlaps with the volcano plot presented in Figure 4B, we have moved the heatmap to Figure S5A in the revised manuscript and provided a detailed description in the figure legend (Lines 106-108 in the supporting information). Additionally, to better illustrate the variation among all samples, we have performed PCA analysis and included the new results in Figure 4A of the revised manuscript.

      (3) The volcano plot does not indicate the q value of the shown genes. It is advisable to calculate the q value for each of the genes which represents the FDR probability of the identified genes.

      Thank you for your careful review. We apologize for the incorrect labeling.

      It was P.adj value. The label for Figure 4B has been corrected in the revised manuscript. 

      (4) GO enrichment was done against the Global gene set or a local geneset? The authors should provide more detailed information about the analysis.

      Thank you. We performed GO enrichment analysis against the global gene set. The description of the results has been updated in the revised manuscript (Lines 222–224).

      (5) If the analysis was performed against a global gene set. How does that connect with this specific atherosclerotic microenvironment?

      Thank you for your insightful comments. We have followed your advice and investigated the functional characteristics of these differentially expressed genes in the context of the atherosclerotic microenvironment. The RNA-seq differential gene list was further mapped onto the atherosclerosis-related gene dataset (PMID: 27374120), resulting in 363 overlapping genes. The 363 genes were subjected to bioinformatics enrichment analysis using Gene Ontology (GO) databases. GO analysis of these genes revealed enrichment in processes related to cell−cell adhesion and leukocyte activation involved in immune response (Figure S5B), which is highly consistent with the observed effects of AFF3ir-ORF2 on VCAM-1 expression. The newly acquired data are presented in Figure S5B and the description of the results is included in the revised manuscript (Line 227-233).

      (6) What was the basal expression of genes and how did the DGE (differential gene expression) values differ?

      Thanks for the comments. The RNA-sequencing data has been submitted to GEO datasets (GSE286206), making the basal gene expression data available to readers.

      The differential expression analysis was performed using DESeq2 (v1.4.5) (PMID: 25516281) with a criterion of 1.5-fold change and P<0.05. We has included the description in the revised manuscript in Lines 220-222 and Lines 575-576.

      (7) How was IRF5 picked from GO analysis? was it within the 20 most significant genes?

      Sorry for the confusion. IRF5 was not identified through GO analysis. To determine the upstream transcriptional regulators, we used the ChEA3 database to predict potential upstream transcription factors based on all differentially expressed genes. The top 20 transcription factors were selected based on their scores. To further explore their relationship with atherosclerosis, these top 20 transcription factors were mapped to the atherosclerosis-related gene list in the DisGeNET database. IRF5 and IRF8 were the only two overlapping genes. To clarify this process, we have included a more detailed description of the IRF prediction approach in the revised manuscript (Lines 234–239).

      (8) Microscopic studies should be done more carefully? There seems to be a global expression present on the vascular wall for Aff3ir-ORF2 and the expression seems to be similar to AFF3 in Figure 1.

      We thank the reviewer for the valuable suggestion. We have followed your advice and provided the more representative images in Figure 1F.

      Reviewer #2 (Public review):

      Summary:

      The authors recently uncovered a novel nested gene, Aff3ir, and this work sets out to study its function in endothelial cells further. Based on differences in expression correlating with areas of altered shear stress, they investigate a role for the isoform Aff3ir-ORF2 in endothelial activation and development of atherosclerosis downstream of disturbed shear stress. Using a knockout mouse model and in vivo overexpression experiments, they demonstrate a strong potential for Aff3ir-ORF2 to alleviate atherosclerosis. They find that Aff3ir-ORF2 interacts with the pro-inflammatory transcription factor IRF5 and retains it in the cytoplasm, hence preventing upregulation of inflammation-associated genes. The data expands our knowledge of IRF5 regulation which could be relevant to researchers studying various inflammatory diseases as well as adding to our understanding of atherosclerosis development.

      Strengths:

      The in vivo data is solid using immunofluorescence staining to assess AFF3ir-ORF2 expression, a knockout mouse model, overexpression and knockdown studies, and rescue experiments in combination with two atherosclerotic models to demonstrate that Aff3ir-ORF2 can lessen atherosclerotic plaque formation in ApoE<sup>-/-</sup> mice.

      We thank you for your positive remarks.

      Weaknesses:

      While the in vivo data is generally convincing, a few data panels have issues and will need addressing. Also, the knockout mouse model will need to be described, since the paper referred to in the manuscript does not actually report any knockout mouse model. Hence it is unclear how Aff3ir-ORF2 is targeted, but Figure S2B shows that targeting is partial, since about 30% expression remains at the RNA level in MEFs isolated from the knockout mice.

      We thank you for the valuable comments. 

      First, we have followed your advice and included detailed information regarding the animal construction in the revised manuscript in Line 405-415. Additionally, the genotyping results have been included in new Figure S3A.

      Second, we acknowledge your concern about the knockout efficiency of ORF2 in mice. While the PCR assay indicated approximately 30% residual expression, our Western blot analysis of aorta samples demonstrated that ORF2 protein was barely detectable in knockout mice, as shown in new Figure S3B-C. Besides, our in vivo experiments using MEF from WT and AFF3ir-ORF2<sup>-/-</sup> mice (Figure 4I) further confirmed successful knockout. 

      Third, we have included a discussion addressing the discrepancies between PCR and Western blot results. In addition to technical differences between the two methods, the nature of AFF3ir-ORF2 may also contribute to these inconsistencies. The parent gene AFF3 is located in a genetically variable region and can be excised via intron 5 to form a replicable transposon, which translocates to other chromosomes and has been linked to leukemia (PMID: 34995897, 12203795, 12743608, and 17968322). AFF3ir is located in the intron 6, thus it exists in the transposon, which may complicate the measurement of its expression. Replicable transposons can exist as extrachromosomal elements, allowing them to be inherited across generations. We have included these discussion in the revised manuscript in Line 188-196.

      While the effect on atherosclerosis is clear, the conclusion that this is the result of reduced endothelial cell activation is not supported by the data. The mouse model is described as a global knockout and the shRNA knockdowns (Figure 5) and overexpression data in Figure 2 are not cell type-specific. Only the overexpression construct in Figure 6 uses an ICAM-2 promoter construct, which drives expression in endothelial cells, though leaky expression of this promoter has been reported in the literature. Therefore, other cell types such as smooth muscle cells or macrophages could be responsible for the effects observed.

      Thank you for your critical comment. To address your concern, we have made the following three revisions:

      First, we have analyzed the expression of AFF3ir-ORF2 in the vascular wall with or without intima in WT and AFF3ir-ORF2 knockout mice. As shown in Figure 1B and Figure S1A, while the expression of AFF3ir-ORF2 was notably downregulated in the aortic intima of athero-prone regions compared to the protective region, it remained largely unchanged in the aortic wall without intima across different regions of the aorta. This suggested that AFF3ir-ORF2 might play a predominant role in endothelial cells rather than other cell types in the context of shear stress.

      Second, we have used human endothelial cells (HUVECs) to further confirm our findings. As shown in Figure 2C and Figure S2B, we found that AFF3ir-ORF2 overexpression could attenuate disturbed shear stress-induced IRF5 nuclear translocation and the expression of inflammatory genes in HUVECs, suggesting the potential anti-inflammatory effects of AFF3ir-ORF2 in endothelial cells.

      Third, we agree with the reviewer’s comment that we cannot completely exclude the potential involvement of other cell types. Hence, we have included a limitation statement in the discussion part in Lines 341-344.

      The weakest part of the manuscript is the in vitro experiment using some nonidentifiable expression differences. The data is used to hypothesise on a role for IRF5 in the effects observed with Aff3ir-ORF2 knockout.

      Thank you for the comments. To address your concerns, we have made the following two changes:

      First, we have further investigated the functional features of the differential genes from the RNA-seq in the context of atherosclerotic microenvironment. The differential gene list was mapped onto the atherosclerosis-related gene dataset (PMID: 27374120), and a total of 363 genes overlapped. These 363 genes were subjected to bioinformatics enrichment analysis using Gene Ontology (GO) databases. GO analysis showed that these genes were mainly enriched in cell−cell adhesion and leukocyte activation involved in immune response, which aligns with the expression of VCAM-1 affected by AFF3ir-ORF2. The newly acquired data are presented in Figure S5B and the description of the results has been updated in the revised manuscript (Line 227-233).

      Second, we have further verified the RNA-seq results in vitro. Several classical inflammatory factors, including ICAM-1, CCL5, and CXCL10, which mRNA levels were significantly downregulated in RNA-seq and were also identified as target genes of IRF5, were analyzed. We found that AFF3ir-ORF2 deficiency aggravated, while AFF3ir-ORF2 overexpression attenuated, the expression of ICAM-1, CCL5, and CXCL10 induced by disturbed shear stress (New Figure S5D). Besides, the regulation of ICAM-1 by AFF3ir-ORF2 was confirmed at both protein and mRNA levels in HUVECs (Figure 2C-D and Figure S2B). 

      Overall, the paper succeeds in demonstrating a link between Aff3ir-ORF2 and atherosclerosis, but the cell types involved and mechanisms remain unclear. The study also shows a functional interaction between Aff3ir-ORF2 and IRF5 in embryonic fibroblasts, but any relevance of this mechanism for atherosclerosis or any cell types involved in the development of this disease remains largely speculative.

      Thank you for all the valuable comments. The specific responses have been provided above. Briefly, we have followed your advice and further confirmed the regulation of AFF3ir-ORF2 on IRF5 in endothelial cells. Besides, the RNA-seq results have been further analyzed, and partial results have been verified in endothelial cells to support the anti-inflammatory role of AFF3ir-ORF2. We greatly appreciate the reviewer’s insightful comments, which guided our revisions and contributed to significantly improving the paper.

      Reviewer #3 (Public review):

      This study is to demonstrate the role of Aff3ir-ORF2 in the atheroprone flow-induced EC dysfunction and ensuing atherosclerosis in mouse models. Overall, the data quality and comprehensiveness are convincing. In silico, in vitro, and in vivo experiments and several atherosclerosis were well executed. To strengthen further, the authors can address human EC relevance.

      We thank you for your positive remarks and insightful comments.

      Major comments:

      (1) The tissue source in Figures 1A and 1B should be clarified, the whole aortic segments or intima? If aortic segment was used, the authors should repeat the experiments using intima, due to the focus of the current study on the endothelium.

      We thank you for the suggestion. The tissue used in Figures 1A and 1B was from aortic intima. The description has been updated for clarity in the revised manuscript on Lines 114-125. 

      (2) Why were MEFs used exclusively in the in vitro experiments? Can the authors repeat some of the critical experiments in mouse or human ECs?

      Thank you for this insightful comment. Isolation and culture of mouse primary aortic ECs were notorious technically difficult and shear stress experiment require a large number of cells. Considering MEFs exhibit responses consistent with those of ECs, which has been delicately proved (PMID: 23754392), we used MEFs in our in vitro experiments.

      However, following your valuable advice, we have now employed human ECs (HUVECs) to confirm our findings. Consistent with our results in MEFs, we found that AFF3ir-ORF2 overexpression reduced the expression of inflammatory genes induced by disturbed shear stress at both protein and mRNA levels in HUVECs (Figure 2C, Figure S2B). Notably, despite the significant anti-inflammatory effects of AFF3irORF2, the sequence of this gene is not conserved in Homo sapiens and lacks an initiation codon, which is why we did not further proceed with the loss-of-function experiments.

      (3) The authors should explain why AFF3ir-ORF2 overexpression did not affect the basal level expression of ICAM-1, VCAM-1, IL-1b, and IL-6 under ST conditions (Figure 2A-C).

      We thank you for raising this critical question. Indeed, we found that AFF3ir-ORF2 overexpression did not affect the basal level of inflammatory genes under ST conditions, while it exerted anti-inflammatory effects under OSS conditions. One underlying reason might be the relative low level of expression of inflammatory genes under ST compared to OSS conditions. Additionally, as our findings suggested, AFF3ir-ORF2 exerted its anti-inflammatory role by binding to IRF5 and inhibiting IRF5 nuclear translocation. However, as shown in Figure 4I, IRF5 might be predominantly localized in the cytoplasm rather than the nucleus under ST conditions.

      We have included the description in the revised manuscript on Lines 157-163.

      (4) Please include data from sham controls, i.e., right carotid artery in Figure 2E.

      Thank you for the suggestion. We have followed your advice and included sham controls (staining of the right carotid arteries) in Figure S2E.

      (5) Given that the merit of the study lies in the effect of different flow patterns, the legion areas in AA and TA (Figure 3B, 3C) should be separately compared.

      We have followed your valuable suggestion and included the additional statistical results in Figure 3C in the revised manuscript.

      (6) For confirmatory purposes for the variations of IRF5 and IRF8, can the authors mine available RNA-seq or even scRNA-seq data on human or mouse atherosclerosis? This approach is important and could complement the current results that are lacking EC data.

      Thank you for your valuable suggestion. In the present study, we found that disturbed flow did not alter the protein level of IRF5 but promoted its nuclear translocation. Following your advice, we analyzed the expression of IRF5 in human ECs (GSE276195) and atherosclerotic mouse arteries (GSE222583) using public databases. Consistently, IRF5 did not show significant changes in mRNA levels under these conditions (Figure S5E-F), suggesting that the regulation of IRF5 in the context of disturbed flow or atherosclerosis is primarily post-translational.

      (7) With the efficacy of using AAV-ICAM2-AFF3ir-ORF2 in atherosclerosis reduction (Figure 6), the authors are encouraged to use lung ECs isolated from the AFF3ir-ORF2/-mice to recapitulate its regulation of IRF5.

      We greatly appreciate your valuable suggestion to use lung ECs from mice. We have observed that AFF3ir-ORF2 deficiency enhanced the nuclear translocation of IRF5 induced by OSS. Noteworthy, the transcriptional levels of IRF5 were minimally affected by AFF3ir-ORF2 deficiency. Hence, to recapitulate the regulation of IRF5 with lung ECs isolated from the AFF3ir-ORF2<sup>-/-</sup> mice, it would require treating lung ECs with OSS followed by isolation of subcellular components. However, both in vitro shear stress treatment and subcellular fraction isolation require a large number of cells, and mouse lung ECs are difficult to culture and pass through several passages. Therefore, we hope the reviewer understands that these experiments were not performed. As an alternative, we have confirmed the transcriptional activity changes of IRF5 due to AFF3ir-ORF2 manipulation by analyzing the expression of its target genes indicated from RNA-seq results in both the intima of mouse aorta (Figure S5C-D) and HUVECs (Figure 2C-D and Figure S2B). Our findings show that AFF3ir-ORF2 deficiency increases, while its overexpression decreases, the expression levels of IRF5-targeted genes in endothelial cells.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 2H - As I understand it, this is MFI measurement of VCAM. Please change accordingly.

      Thanks. Corrected.

      Reviewer #2 (Recommendations for the authors):

      My major concern is the use of MEFs for all in vitro experiments. All experiments should be done in endothelial cells if the aim is to show a mechanism relevant to endothelial activation and atherosclerosis. Lines 314-316 of the conclusion are absolutely not supported by the data.

      Thank you for the insightful comment. Following your advice, we have employed human ECs (HUVECs) to confirm our findings. Consistent with the findings in MEFs, we found that AFF3ir-ORF2 decreased the expression of inflammatory genes induced by disturbed shear stress, both at protein and mRNA levels in HUVECs (Figure 2C, Figure S2B). 

      Since the in vivo experiments are not cell type-specific, it would be important to test and compare the expression of Aff3ir-ORF2 in endothelial cells as well as smooth muscle and macrophages to support any claim of cell type involvement in the effects observed.

      We thank you for the valuable suggestion. In the revised manuscript, we have followed your suggestion and analyzed the expression pattern of AFF3ir-ORF2 in different regions of the aorta with or without endothelium. We observed a marked reduction in AFF3ir-ORF2 expression in the intima of the aortic arch compared to that in the intima of the thoracic aorta (Figure 1B-C). In contrast, the expression of AFF3irORF2 in the media and adventitia was comparable between the aortic arch and thoracic aorta (Figure S1A-B). These findings provide further evidence supporting the predominant role of endothelial cells. The description has been modified accordingly in the revised manuscript on Lines 121-134.

      The results of the RNA-seq experiment should be disclosed. The experiment should be deposited on GEO or similar and a table of differentially expressed genes added to the manuscript.

      Thank you for the suggestion. We have followed your advice and submitted the RNA-sequencing data to GEO datasets (GSE286206). Besides, a table of differentially expressed genes has been included in the revised manuscript as Table S3.

      Minor comments:

      (1) Figure 1A. Missing the labels of the target.

      Thanks. Corrected. 

      (2) Figure 1D. Cell alignment in AA compared to TA suggests that the image is of the outer curvature, but Figure 1F is showing that the outer curvature is expressing more ORF2 than the inner. Why was the outer curvature chosen for this panel and is it true to conclude on that assumption that expression of ORF2 compares as TA > Outer > Inner curvature?

      We thank you for the insightful suggestion. We have followed your advice and performed en-face immunofluorescence staining of AFF3ir-ORF2 and quantification of AFF3ir-ORF2 expression in AA inner, AA outer, and TA regions. As shown in new Figure 1D-E, the results indeed indicated that expression of AFF3irORF2 compares as TA > AA outer > AA inner.

      (3) Figure 2H. Target mislabelled as ICAM-1 instead of VCAM-.

      Thanks. Corrected. 

      (4) Figure S1A. VE-cad staining and cell shape differ between control and overexpression. Is this a phenotype or are different areas of the vasculature shown, which would make it hard to interpret since Aff3ir-ORF2 levels differ in different vessel areas?

      We thank the reviewer for raising this important question. For Figure S1A, only common carotid arteries were used for the staining. The potential differences in cell shape observed might be due to variations in the procedure during immunofluorescence staining. To avoid any misinterpretation, more representative images have been provided in the revised Figure S2C.

      (5) Figure 3D-G. Images are not representative of the quantification results.

      Thank you. More representative images have been replaced in the revised Figure 3D and Figure 3F.

      (6) Line 220. Data for IRF8 are not shown in the figure to support this claim.

      Thank you for pointing this out. The expression level of IRF8 has been included in Figure S5C.

      (7) Figure 6F. AAV-AFF3ir-ORF2 panel order inverted.

      Thanks. Corrected. 

      (8) Line 401. Type "hat" instead of "h at".

      Sorry for the typo. Corrected.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1)  The rationale for the following sentence (lines 126-128) is lacking: "Moreover, 126 we observed the expression of AFF3ir-ORF2 in longitudinal sections of the mouse aorta (B. 127 Li et al., 2019)".

      Thanks. The rationale for these experiments have been included in the revised manuscript on Line 127-129. 

      (2) The source of antibodies against AFF3ir-ORF1 and AFF3ir-ORF2 used in western blot and immunostaining experiments were not mentioned in the manuscript.

      Thanks. The antibody information has been included in the method part on Line 456-457, 510-511. 

      (3) The rationale and data interpretation is not clear for the following sentence (lines 220-221): "In addition, neither IRF5 nor IRF8 expression was regulated by AFF3irORF2 220 (Figure 4F)".

      Thank you for pointing this out. The expression level of IRF8 has been included in Figure S5C. The sentence has been modified accordingly on Lines 253254. 

      (4) The quality of AFF3ir-ORF2 blot in Figure 4I needs improvement.

      Thanks. More representative images have been included in Figure 4I.

      (5) It appears that AFF3ir-ORF2 was present in both cytoplasm and nucleus. Does AFF3ir-ORF2 have a nuclear entry peptide? Also, the nuclear entry of AFF3ir-ORF2 can be enhanced by an immunofluorescence staining experiment.

      Thank you for your insightful comments. Indeed, although we did not observe any significant subcellular changes in the localization of AFF3ir-ORF2 under shear stress conditions, our immunostaining results revealed that AFF3ir-ORF2 is localized in both the cytoplasm and nucleus. To explore whether AFF3ir-ORF2 contains nuclear localization signals, we utilized the NLStradamus tool (http://www.moseslab.csb.utoronto.ca/NLStradamus/) to analyze its sequence. The predication indicated that AFF3ir-ORF2 lacks a nuclear localization signal.

    1. Author response:

      Reviewer 1: “The authors over-emphasized this study's relevance to RP disease (i.e. patients and mammals are not capable of regeneration like zebrafish).”

      It is true that humans and other mammals are not capable of regeneration.  This is why we and many other groups study zebrafish to identify mechanisms of regeneration that successfully form new rods.  That said, our previous paper on the molecular basis or retinal remodeling in this zebrafish model system (Santhanam et al., 2023; Cell Mol Life Sci. 2023;80(12):362) revealed remarkable similarities in the stress and physiological responses of rods, cones, RPE and inner retinal neurons to those in mammalian RP models.  Thus, we believe this zebrafish is an adequate model of RP and an excellent model to study rod regeneration. 

      Reviewer 1: “They under-explained this regeneration's relevance or difference to normal developmental process, which is pretty much conserved in evolution.”  and:

      Reviewer 3: “It would also benefit from integration with single-cell multiome data from developing retinas (Lyu, et al. 2023).”

      It is an excellent suggestion to compare the regenerative response we have studied in a chronic degeneration/regeneration model to the trajectory of developmental rod formation. In Lyu, et at. 2023, it was found that while retinal regeneration has similarities to retinal development, it does not precisely recapitulate the same transcription factors and processes. Any differences between this trajectory and that revealed in developmental studies would be enlightening.  We intend to do such analyses to add to a revised manuscript in the future. 

      Reviewer 2: “Perhaps the authors can consider explaining why the Prdm1a knock-down cells would have a higher Retp1 signal per cell in Fig 9B. Is this a representative picture? This appears to contradict Figure 8's conclusion, although I could tell that the number of Retp1+ cells in the ONL appears to be lower.”

      These are different experimental paradigms.  Figure 8 shows knockdown 48 hours after injection, at which time prdm1a knockdown is affecting rhodopsin expression directly.  That experiment investigated whether prdm1a knockdown affected progenitor proliferation.  Figure 9 shows a time point 6 days after injection, at which time we were asking if prdm1a knockdown affected differentiation of progenitors into rods. 

      Reviewer 2: “The authors noted "Surprisingly, the knockdown of prdm1a resulted in a significantly higher number of rhodopsin-positive cells in the INL (p=0.0293)", while it appears in Figure 9B, 9C that the difference is 2 cells vs 0 in a rightly broader field. It seems to be too strong of a statement for this effect.”

      This was a very unexpected finding.  We included statistics (Figure 9D) to support the finding, so we don’t think it is too strong a statement to make.  Speculation as to what might cause this is fascinating.  Are Muller cells producing progenitors that fail to migrate to the ONL before differentiating into rods?  The lack of BrdU labeling does not support this idea.  Do neurogenic progenitor cells in the INL differentiate towards rods via a pathway that does not require prdm1a?  Perhaps.  Perhaps there are other explanations.

      Reviewer 2: “It appears to this reviewer that the proteomic data didn't reveal much in line with the overall hypothesis or the mechanism, and it's unclear why the authors went for proteomics rather than bulk RNA-seq or ChIP-seq for a transcription factor knock-down experiment. Overall this is a minor point.”

      We agree that bulk RNA sequencing would provide a similar answer, possibly with greater sensitivity.  We chose proteomics for two reasons: 1) We wanted an independent assessment of the knockdown effects that could evaluate whether the knockdowns worked and what pathways were affected.  Since our pathway comparison is to single cell RNAseq data, bulk RNA seq did not seem to be fully independent. 2) Because we used translation-blocking antisense oligos for most knockdown experiments, we did not expect the transcript abundance of the targeted gene to be affected, although these oligos can lead to target transcript degradation.  Thus, we were not likely to be able to validate that our knockdown worked with this technique. 

      Reviewer 3: “The gene regulatory network analysis here would also benefit from the addition of matched scATAC-Seq data, …”

      This is certainly true, and the reviewer points to several studies that have made excellent use of this strategy.  Given the 1-2 year timeline to obtain and analyze such data, it is unlikely that we will be able to incorporate such data in our revised manuscript, but we hope to do so for follow-up studies.

      Reviewer 3: “The description of the time points analyzed is vague, stating only that "fish from 6 to 12 months of age were analyzed". Since photoreceptor degeneration is progressive, it is unclear how progenitor behavior changes over time, or how the gene expression profile of other cell types such as microglia, cones, or surviving rods is altered by disease progression.”

      We have shown in a previous study (Santhanam et al. Cells. 2020;9(10)) that rod degeneration and regeneration are in a steady state from at least 4 to 8 months of age, and in other experiments in the lab at least to 12 months of age.  In this age range, regeneration keeps up with the pace of degeneration, both of which are very fast.  This encompasses the cell types that we specifically study in this manuscript.  The reviewer is right that other cell types could undergo changes.  This is a separate topic of study in the lab.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The objective of this research is to understand how the expression of key selector transcription factors, Tal1, Gata2, Gata3, involved in GABAergic vs glutamatergic neuron fate from a single anterior hindbrain progenitor domain is transcriptionally controlled. With suitable scRNAseq, scATAC-seq, CUT&TAG, and footprinting datasets, the authors use an extensive set of computational approaches to identify putative regulatory elements and upstream transcription factors that may control selector TF expression. This data-rich study will be a valuable resource for future hypothesis testing, through perturbation approaches, of the many putative regulators identified in the study. The data are displayed in some of the main and supplemental figures in a way that makes it difficult to appreciate and understand the authors' presentation and interpretation of the data in the Results narrative. Primary images used for studying the timing and coexpression of putative upstream regulators, Insm1, E2f1, Ebf1, and Tead2 with Tal1 are difficult to interpret and do not convincingly support the authors' conclusions. There appears to be little overlap in the fluorescent labeling, and it is not clear whether the signals are located in the cell soma nucleus.

      Strengths:

      The main strength is that it is a data-rich compilation of putative upstream regulators of selector TFs that control GABAergic vs glutamatergic neuron fates in the brainstem. This resource now enables future perturbation-based hypothesis testing of the gene regulatory networks that help to build brain circuitry.

      We thank Reviewer #1 for the thoughtful assessment and recognition of the extensive datasets and computational approaches employed in our study. We appreciate the acknowledgment that our efforts in compiling data-rich resources for identifying putative regulators of key selector transcription factors (TFs)—Tal1, Gata2, and Gata3—are valuable for future hypothesis-driven research.

      Weaknesses:

      Some of the findings could be better displayed and discussed.

      We acknowledge the concerns raised regarding the clarity and interpretability of certain figures, particularly those related to expression analyses of candidate upstream regulators such as Insm1, E2f1, Ebf1, and Tead2 in relation to Tal1. We agree that clearer visualization and improved annotation of fluorescence signals are crucial to accurately support our conclusions. In our revised manuscript, we will enhance image clarity and clearly indicate sites of co-expression for Tal1 and its putative regulators, ensuring the results are more readily interpretable. Additionally, we will expand explanatory narratives within the figure legends to better align the figures with the results section.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript, the authors seek to discover putative gene regulatory interactions underlying the lineage bifurcation process of neural progenitor cells in the embryonic mouse anterior brainstem into GABAergic and glutamatergic neuronal subtypes. The authors analyze single-cell RNA-seq and single-cell ATAC-seq datasets derived from the ventral rhombomere 1 of embryonic mouse brainstems to annotate cell types and make predictions or where TFs bind upstream and downstream of the effector TFs using computational methods. They add data on the genomic distributions of some of the key transcription factors and layer these onto the single-cell data to get a sense of the transcriptional dynamics.

      Strengths:

      The authors use a well-defined fate decision point from brainstem progenitors that can make two very different kinds of neurons. They already know the key TFs for selecting the neuronal type from genetic studies, so they focus their gene regulatory analysis squarely on the mechanisms that are immediately upstream and downstream of these key factors. The authors use a combination of single-cell and bulk sequencing data, prediction and validation, and computation.

      We also appreciate the thoughtful comments from Reviewer #2, highlighting the strengths of our approach in elucidating gene regulatory interactions that govern neuronal fate decisions in the embryonic mouse brainstem. We are pleased that our focus on a critical cell-fate decision point and the integration of diverse data modalities, combined with computational analyses, has been recognized as a key strength.

      Weaknesses:

      The study generates a lot of data about transcription factor binding sites, both predicted and validated, but the data are substantially descriptive. It remains challenging to understand how the integration of all these different TFs works together to switch terminal programs on and off.

      Reviewer #2 correctly points out that while our study provides extensive data on predicted and validated transcription factor binding sites, clearly illustrating how these factors collectively interact to regulate terminal neuronal differentiation programs remains challenging. We acknowledge the inherently descriptive nature of the current interpretation of our combined datasets.

      In our revision, we will clarify how the different data types support and corroborate one another, highlighting what we consider the most reliable observations of TF activity. Additionally, we will revise the discussion to address the challenges associated with interpreting the highly complex networks of interactions within the gene regulatory landscape.

      We sincerely thank both reviewers for their constructive feedback, which we believe will significantly enhance the quality and accessibility of our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors demonstrate impairments induced by a high cholesterol diet on GLP-1R dependent glucoregulation in vivo as well as an improvement after reduction in cholesterol synthesis with simvastatin in pancreatic islets. They also map sites of cholesterol high occupancy and residence time on active versus inactive GLP-1Rs using coarse-grained molecular dynamics (cgMD) simulations and screened for key residues selected from these sites and performed detailed analyses of the effects of mutating one of these residues, Val229, to alanine on GLP-1R interactions with cholesterol, plasma membrane behaviour, clustering, trafficking and signalling in pancreatic beta cells and primary islets, and describe an improved insulin secretion profile for the V229A mutant receptor.

      These are extensive and very impressive studies indeed. I am impressed with the tireless effort exerted to understand the details of molecular mechanisms involved in the effects of cholesterol for GLP-1 activation of its receptor. In general, the study is convincing, the manuscript well written and the data well presented.

      Some of the changes are small and insignificant which makes one wonder how important the observations are. For instance, in figure 2 E (which is difficult to interpret anyway because the data are presented in percent, conveniently hiding the absolute results) does not show a significant result of the cyclodextrin except for insignificant increases in basal secretion. That is not identical to impairment of GLP-1 receptor signaling!

      We assume that the reviewer refers to Figure 1E, where we show the percentage of insulin secretion in response to 11 mM glucose +/- exendin-4 stimulation in mouse islets pretreated with vehicle or MβCD loaded with 20 mM cholesterol. While we concur with the reviewer that the effect in this case is triggered by increased basal insulin secretion at 11 mM glucose, exendin-4 appears to no longer compensate for this increase by proportionally amplifying insulin responses in cholesterol-loaded islets, leading to a significantly decreased exendin-4induced insulin secretion fold increase under these circumstances, as shown in Figure 1F. We interpret these results as a defect in the GLP-1R capacity to amplify insulin secretion beyond the basal level to the same extent as in vehicle conditions. An alternative explanation is that there is a maximum level of insulin secretion in our cells, and 11 mM glucose + exendin-4 stimulation gets close to that value. With the increasing effect of cholesterol-loaded MβCD on basal secretion at 11 mM glucose, exendin-4 stimulation would then appear to work less well.

      We have performed a simple experiment to investigate this possibility: insulin secretion following stimulation with a secretagogue cocktail (20 mM glucose, 30 mM KCl, 10 µM FSK and 100 µM IBMX) in islets +/- MβCD/cholesterol loading to determine if maximal stimulation had been reached or not in our original experiment. This experiment, now included in Supplementary Figure 1C, demonstrates that insulin secretion can increase up to ~4% (from ~2%) in our islets, supporting our initial conclusion. We have also included absolute insulin concentrations as well as percentages of secretion for all the experiments included in the study in the new Supplementary File 1 to improve the completeness of the report.

      To me the most important experiment of them all is the simvastatin experiment, but the results rest on very few numbers and there is a large variation. Apparently, in a previous study using more extensive reduction in cholesterol the opposite response was detected casting doubt on the significance of the current observation. I agree with the authors that the use of cyclodextrin may have been associated with other changes in plasma membrane structure than cholesterol depletion at the GLP-1 receptor.

      We agree with the reviewer that the insulin secretion results in vehicle versus LPDS/simvastatin treated mouse islets (Figure 1H, I) are relatively variable. We have therefore performed 2 extra biological repeats of this experiment (for a total n of 7). Results now show a significant increase in exendin-4-stimulated secretion with no change in basal secretion in islets pre-incubated with LPDS/simvastatin.  

      The entire discussion regarding the importance of cholesterol would benefit tremendously from studies of GLP-1 induced insulin secretion in people with different cholesterol levels before and after treatment with cholesterol-lowering agents. I suspect that such a study would not reveal major differences.

      We agree with the reviewer that such study would be highly relevant. While this falls outside the scope of the present paper, we encourage other researchers with access to clinical data on GLP-1R agonist responses in individuals taking cholesterol lowering agents to share their results with the scientific community. We have highlighted this point in the paper discussion to emphasise the importance of more research in this area.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors provided a proof of concept that they can identify and mutate a cholesterol-binding site of a high-interest class B receptor, the GLP-1R, and functionally characterize the impact of this mutation on receptor behavior in the membrane and downstream signaling with the intent that similar methods can be useful to optimize small molecules that as ligands or allosteric modulators of GLP-1R can improve the therapeutic tools targeting this signaling system.

      Strengths:

      The majority of results on receptor behavior are elucidated in INS-1 cells expressing the wt or mutant GLP-1R, with one experiment translating the findings to primary mouse beta-cells. I think this paper lays a very strong foundation to characterize this mutation and does a good job discussing how complex cholesterol-receptor interactions can be (ie lower cholesterol binding to V229A GLP-1R, yet increased segregation to lipid rafts). Table 1 and Figure 9 are very beneficial to summarize the findings. The lower interaction with cholesterol and lower membrane diffusion in V229A GLP-1R resembles the reduced diffusion of wt GLP-1R with simv-induced cholesterol reductions, although by presumably decreasing the cholesterol available to interact with wt GLP-1R. This could be interesting to see if lowering cholesterol alters other behaviors of wt GLP-1R that look similar to V229A GLP-1R. I further wonder if the authors expect that increased cholesterol content of islets (with loading of MβCD saturated with cholesterol or high-cholesterol diets) would elevate baseline GLP-1R membrane diffusion, and if a more broad relationship can be drawn between GLP-1R membrane movement and downstream signaling.

      Membrane diffusion experiments are difficult to perform in intact islets as our method requires cell monolayers for RICS analysis. We however agree that it is of interest to investigate if cholesterol loading affects GLP-1R diffusion. To this end, we have performed further RICS analysis in INS-1 832/3 SNAP/FLAG-hGLP-1R cells pretreated with vehicle or MβCD loaded with 20 mM cholesterol (new Supplementary Figures 1D and 1E). Interestingly, results show significantly increased plasma membrane diffusion of exendin-4-stimulated receptors, with no change in basal diffusion, following MβCD/cholesterol loading. This behaviour differs from that of the V229A mutant receptor which shows reduced diffusion under basal conditions, a pattern that mimics that of the WT receptor under low cholesterol conditions (by pre-treatment with LPDS/simvastatin).

      Weaknesses:

      I think there are no obvious weaknesses in this manuscript and overall, I believe the authors achieved their aims and have demonstrated the importance of cholesterol interactions on GLP-1R functioning in beta-cells. I think this paper will be of interest to many physiologists who may not be familiar with many of the techniques used in this paper and the authors largely do a good job explaining the goals of using each method in the results section.

      The intent of some methods, for example the Laurdan probe studies, are better expanded in the discussion.

      We have expanded on the rationale behind the use of Laurdan to assess behaviours of lipid packed membrane nanodomains in the methods, results and discussion of the revised manuscript.

      I found it unclear what exactly was being measured to assess 'receptor activity' in Fig 7E and F.

      Figures 7E and F refer to bystander complementation assays measuring the recruitment of nanobody 37 (Nb37)-SmBiT, which binds to active Gas, to either the plasma membrane (labelled with KRAS CAAX motif-LgBiT), or to endosomes (labelled with Endofin FYVE domain-LgBiT) in response to GLP-1R stimulation with exendin-4. This assay therefore measures GLP-1R activation specifically at each of these two subcellular locations. We have included a schematic of this assay in the new Supplementary Figure 3 to clarify the aim of these experiments.

      Certainly many follow-up experiments are possible from these initial findings and of primary interest is how this mutation affects insulin homeostasis in vivo under different physiological conditions. One of the biggest pathologies in insulin homeostasis in obesity/t2d is an elevation of baseline insulin release (as modeled in Fig 1E) that renders the fold-change in glucose stimulated insulin levels lower and physiologically less effective. No difference in primary mouse islet baseline insulin secretion was seen here but I wonder if this mutation would ameliorate diet-induced baseline hyperinsulinemia.

      We concur with the reviewer that it would be interesting to determine the effects of the GLP1R V229A mutation on insulin secretion responses under diet-induced metabolic stress conditions. While performing in vivo experiments on glucoregulation in mice harbouring the V229A mutation falls outside the scope of the present study, we have included ex vivo insulin secretion experiments in islets from GLP-1R KO mice transduced with adenoviruses expressing SNAP/FLAG-hGLP-1R WT or V229A and subsequently treated with vehicle versus MβCD loaded with 20 mM cholesterol to replicate the conditions of Figure 1E in the new Supplementary Figure 4.

      I would have liked to see the actual islet cholesterol content after 5wks high-cholesterol diet measured to correlate increased cholesterol load with diminished glucose-stimulated inulin. While not necessary for this paper, a comparison of islet cholesterol content after this cholesterol diet vs the more typical 60% HFD used in obesity research would be beneficial for GLP-1 physiology research broadly to take these findings into consideration with model choice.

      We have included these data in Supplementary Figure 1A.

      Another area to further investigate is does this mutation alter ex4 interaction/affinity/time of binding to GLP-1 or are all of the described findings due to changes in behavior and function of the receptor?

      To answer this question, have performed binding affinity experiments, which show no differences, in INS-1 832/3 SNAP/FLAG-hGLP-1R WT versus V229A cells (new Supplementary Figure 2D).

      Lastly, I wonder if V229A would have the same impact in a different cell type, especially in neurons? How similar are the cholesterol profiles of beta-cells and neurons? How this mutation (and future developed small molecules) may affect satiation, gut motility, and especially nausea, are of high translational interest. The comparison is drawn in the discussion between this mutation and ex4-phe1 to have biased agonism towards Gs over beta-arrestin signaling. Ex4-phe1 lowered pica behavior (a proxy for nausea) in the authors previously co-authored paper on ex4-phe1 (PMID 29686402) and I think drawing a parallel for this mutation or modification of cholesterol binding to potentially mitigate nausea is worth highlighting.

      While experiments in neurons are outside the scope of the present study, we have added this worthy point to the discussion and hypothesise on possible effects of GLP-1R mutants with modified cholesterol interactions on central GLP-1R actions in the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      There are no line numbers

      These have now been added.

      Abstract: "Cholesterol is a plasma membrane enriched lipid" - sorry for being finicky, but shouldn't this read; "a lipid often enriched in plasma membranes"

      We have modified the abstract to state that: “Cholesterol is a lipid enriched at the plasma membrane”.

      p. 4 "Moreover, islets extracted from high cholesterol-fed mice". How do you "extract islets"?

      We have exchanged the term “extracted” by “isolated”. Islet isolation is described in the paper methods section.

      p. 4 The sentence "These effects were accompanied by decreased GLP-1R plasma membrane diffusion under vehicle conditions, measured by Raster Image Correlation Spectroscopy (RICS) in rat insulinoma INS-1 832/3 cells with endogenous GLP-1R deleted [INS-1 832/3 GLP-1R KO cells (27)] stably expressing SNAP/FLAG-tagged human GLP-1R (SNAP/FLAG-hGLP-1R), an effect that is normally triggered by agonist binding (28), as also observed here (Supplementary Figure 1C, D)" is a masterpiece of complexity. Perhaps breaking up would facilitate reading?

      This paragraph has now been modified in the revised manuscript.

      p. 5. I cannot evaluate the "coarse grain molecular dynamics" studies.

      Reviewer #2 (Recommendations for the authors):

      I view this as an excellent manuscript with very comprehensive work and clear translational relevance. I don't think any further experiments are needed for the scope outlined in this manuscript. The discussion is already long but a short postulation on how this may translate to GLP-1R-cholesterol interactions in other cell types, specifically neurons with the intent on manipulating satiation and nausea, could be worthwhile.

      This has now been added.

      The only thing for readability I would suggest is a sentence in the results mentioning why you're doing the Laurdan analysis, and what is the output for assessing 'receptor activity' in the membrane and endosomes.

      Both points have now been added.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors examine CD8 T cell selective pressure in early HCV infection using. They propose that after initial CD8-T mediated loss of virus fitness, in some participants around 3 months after infection, HCV acquires compensatory mutations and improved fitness leading to virus progression.

      Strengths:

      Throughout the paper, the authors apply well-established approaches in studies of acute to chronic HIV infection for studies of HCV infection. This lends rigor the to the authors' work.

      Weaknesses:

      (1) The Discussion could be strengthened by a direct discussion of the parallels/differences in results between HIV and HCV infections in terms of T cell selection, entropy, and fitness.

      We have added a direct discussion of the parallels/differences between HIV and HCV throughout the discussion including at lines 308 – 310 and 315 -327.

      Lines 308-310: “In fact, many parallels can be drawn between HIV infections and HCV infections in the context of emerging viral species that escape T cell immune responses.”

      Lines: 315-327: “One major difference between HCV and HIV infection is the event where patients infected with HCV have an approximately 25% chance to naturally clear the infection as opposed to just achieving viral control in HIV infections. Here, we probed the underlying mechanism, and questioned how the host immune response and HCV mutational landscape can allow the virus to escape the immune system. To understand this process, taking inspiration from HIV studies (24), a quantitative analysis of viral fitness relative to viral haplotypes was conducted using longitudinal samples to investigate whether a similar phenomenon was identified in HCV infections for our cohort for patients who progress to chronic infection. We observed a decrease in population average relative fitness in the period of <90DPI with respect to the T/F virus in chronic subjects infected with HCV. The decrease in fitness correlated positively with IFN-γ ELISPOT responses and negatively with SE indicating that CD8+ T-cell responses drove the rapid emergence of immune escape variants, which initially reduced viral fitness. This is similarly reflected in HIV infected patients where strong CD8+ T-cell responses drove quicker emergence of immune escape variants, often accompanied by compensatory mutations (24).”

      (2) In the Results, please describe the Barton model functionality and why the fitness landscape model was most applicable for studies of HCV viral diversity.

      This has been added to the introduction section rather than Results as we feel that it is more appropriate to show why it is most applicable to HCV viral diversity in the background section of the manuscript. We write at lines 77-90:

      “Barton et al.’s [23] approach to understand HIV mutational landscape resulting in immune escape had two fundamental points: 1) replicative fitness depends on the virus sequence and the requirement to consider the effect of co-occurring mutations, and 2) evolutionary dynamics (e.g. host immune pressure). Together they pave the way to predict the mutational space in which viral strains can change given the unique immune pressure exerted by individuals infected with HIV. This model fits well with the pathology of HCV infection. For instance, HIV and HCV are both RNA viruses with rapid rate of mutation. Additionally, like HIV, chronic infection is an outcome for HCV infected individuals, however, unlike HIV, there is a 25% probability that individuals infected with HCV will naturally clear the virus. Previously published studies [9] have shown that HIV also goes through a genetic bottleneck which results in the T/F virus losing dominance and replaced by a chronic subtype, identified by the immune escape mutations. The concepts in Barton’s model and its functionality to assess the fitness based on the complex interaction between viral sequence composition and host immune response is also applicable to early HCV infection.”

      (3) Recognize the caveats of the HCV mapping data presented.

      We have now recognized the caveats of the HCV mapping data at lines 354-256 “While our findings here are promising, it should be recognized that although the bioinformatics tool (iedb_tool.py) proved useful for identifying potential epitopes, there could be epitopes that are not predicted or false-positive from the output which could lead to missing real epitopes”

      (4) The authors should provide more data or cite publications to support the authors' statement that HCV-specific CD8 T cell responses decline following infection.

      We have now clarified at lines 352-353 that the decline was toward “selected epitopes that showed evidence of escape”.

      Furthermore, we have cited two publications at line 352 that support our statement.

      (5) Similarly, as the authors' measurements of HCV T and humoral responses were not exhaustive, the text describing the decline of T cells with the onset of humoral immunity needs caveats or more rigorous discussion with citations (Discussion lines 319-321).

      We have now added a caveat in the discussion at lines 357-360 which reads

      “In conclusion, this study provides initial insights into the evolutionary dynamics of HCV, showing that an early, robust CD8+ T-cell response without nAbs strongly selects against the T/F virus, enabling it to escape and establish chronic infection. However, these findings are preliminary and not exhaustive, warranting further investigation to fully understand these dynamics. “

      (6) What role does antigen drive play in these data -for both T can and antibody induction?

      It is possible that HLA-adapted mutations could limit CD8 T cell induction if the HLAs were matched between transmission pairs, as has been shown previously for HIV (https://doi.org/10.1371/journal.ppat.1008177) with some data for HCV (https://journals.asm.org/doi/10.1128/jvi.00912-06). However, we apologise as we are not entirely sure that this is what the reviewer is asking for in this instance.

      (7) Figure 3 - are the X and Y axes wrongly labelled? The Divergent ranges of population fitness do not make sense.

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      (8) Figure S3 - is the green line, average virus fitness?

      This has now been clarified in Figure S3.

      (9) Use the term antibody epitopes, not B cell epitopes.

      We now use the term antibody epitopes throughout the manuscript.

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      (1) Introduction:

      Line 52: 'carry mutations B/T cell epitopes'. Two points

      i) These are antibody epitopes (and antibody selection) not B cell epitopes

      We have corrected this sentence at line 55 which now reads: “carry mutations within epitopes targeted by B cells and CD8+ T cells”.

      ii) To avoid confusion, add text that mutations were generated following selection in the donor.

      For HCV, it is unclear if mutations are generated following selection or have been occurring in low frequencies outside detection range. Only when selection by host immune pressure arises do the potentially low-frequency variants become dominant. However, we do acknowledge it is potentially misleading to only mention new variants replacing the transmitted/founder population. We have modified the sentence at line 52 to read:

      “At this stage either an existing variant that was occurring in low-frequency outside detection range or an existing variant with novel mutations generated following immune selection is observed in those who progress to chronic infection”

      - Lines 51-56: Human studies of escape and progression are associative, not causative as implied.

      Correct, evidence suggesting that escape and progression are currently associative. We have now corrected these lines to no longer suggest causation.

      - Line 65: Suggest you clarify your meaning of 'easier'?

      This sentence, now at line 72, has been modified to: “subtype 1b viruses have a higher probability to evade immune responses”

      (2) Results:

      - Line 147: Barton model (ref'd in Intro) is directly referred to here but not referenced.

      The reference has been added.

      - The authors should cite previous HIV literature describing associations between the rate of escape and Shannon Entropy e.g. the interaction between immunodominance, entropy, and rate of escape in acute HIV infection was described in Liu et al JCI 2013 but is not cited.

      We have now cited previous HIV research at line 147-151, adding Liu et al:

      “Additionally, the interaction between immunodominance, entropy, and escape rate in acute HIV infection has been described, where immunodominance during acute infection was the most significant factor influencing CD8+ T cell pressure, with higher immunodominance linked to faster escape (27). In contrast, lower epitope entropy slowed escape, and together, immunodominance and entropy explained half of the variability in escape timing (27).”

      - Line 319: The authors suggest that HCV-specific CD8 T cell response declines following early infection. On what are they basing this statement? The authors show their measured T cell responses decline but their approach uses selected epitopes and they are therefore unable to assess total HCV T cell response in participants (Where there is no escape, are T cell magnitudes maintained or do they still decline?). Can the authors cite other studies to support their statement?

      We have now clarified that the decline was toward “selected epitopes that showed evidence of escape”. Furthermore, we also cite two studies to support our findings.

      - Throughout the authors talk in terms of CD8 T cells but the ELISpot detects both CD4 and CD8 T cell responses. I suggest the authors be more explicit that their peptide design (9-10mers) is strongly biased to only the detection of CD8 T cells.

      To make this clearer and more explicit we have now added to the methods section at line 433-435:

      “While the ELISpot assay detects responses from both CD4 and CD8 T cells, our peptide design (9-10mers) is strongly biased toward CD8 T-cell detection. We have therefore interpreted ELISpot responses primarily in terms of CD8 T-cell activity.”

      - The points made in lines 307-321 could be more succinct

      We have now edited the discussion (lines 307 – 321) to make the points more succinct (now lines 307-323).

      Minor corrections to text, figures:

      - Figure 2: suggest making the Key bigger and more obvious.

      We have now made the key bigger and more obvious

      - Figure 3 A & D....is there an error on the X-axis...are you really reporting ELISpot data of < 1 spot/10^6? Perhaps the X and Y axes are wrongly labelled?

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      - Figure 5: As this is PBMC, remove CD8 from the description of ELISpot. 

      We have now removed CD8 from the description of ELISpot in both Figure 5 and Figure S3

      Reviewer #2 (Public review):

      Summary:

      In this work, Walker and collaborators study the evolution of hepatitis C virus (HCV) in a cohort of 14 subjects with recent HCV infections. They focus in particular on the interplay between HCV and the immune system, including the accumulation of mutations in CD8+ T cell epitopes to evade immunity. Using a computational method to estimate the fitness effects of HCV mutations, they find that viral fitness declines as the virus mutates to escape T-cell responses. In long-term infections, they found that viral fitness can rebound later in infection as HCV accumulates additional mutations.

      Strengths:

      This work is especially interesting for several reasons. Individuals who developed chronic infections were followed over fairly long times and, in most cases, samples of the viral population were obtained frequently. At the same time, the authors also measured CD8+ T cell and antibody responses to infection. The analysis of HCV evolution focused not only on variation within particular CD8+ T cell epitopes but also on the surrounding proteins. Overall, this work is notable for integrating information about HCV sequence evolution, host immune responses, and computational metrics of fitness and sequence variation. The evidence presented by the authors supports the main conclusions of the paper described above.

      Weaknesses:

      One notable weakness of the present version of the manuscript is a lack of clarity in the description of the method of fitness estimation. In the previous studies of HIV and HCV cited by the authors, fitness models were derived by fitting the model (equation between lines 435 and 436) to viral sequence data collected from many different individuals. In the section "Estimating survival fitness of viral variants," it is not entirely clear if Walker and collaborators have used the same approach (i.e., fitting the model to viral sequences from many individuals), or whether they have used the sequence data from each individual to produce models that are specific to each subject. If it is the former, then the authors should describe where these sequences were obtained and the statistics of the data.

      If the fitness models were inferred based on the data from each subject, then more explanation is needed. In prior work, the use of these models to estimate fitness was justified by arguing that sequence variants common to many individuals are likely to be well-tolerated by the virus, while ones that are rare are likely to have high fitness costs. This justification is less clear for sequence variation within a single individual, where the viral population has had much less time to "explore" the sequence landscape. Nonetheless, there is precedent for this kind of analysis (see, e.g., Asti et al., PLoS Comput Biol 2016). If the authors took this approach, then this point should be discussed clearly and contrasted with the prior HIV and HCV studies.

      We thank the reviewer for pointing out the weakness in our explanation and description of the fitness model. The model has been generated using publicly released viral sequences and this has been described in a previous publication by Hart et al. 2015. T/F virus from each of the subjects chronically infected with HCV in our cohort were given to the model by Hart et al. to estimate the initial viral fitness of the T/F variant. Subsequent time points of each subject containing the subvariants of the viral population were also estimated using the same model (each subtype). For each subject, these subvariant viral fitness values were divided by the fitness value of the initial T/F virus (hence relative fitness of the earliest time points with no mutations in the epitope regions were a value of 1.000). All other fitness values are therefore relative fitness to the T/F variant.

      We have further clarified this point in the methods section “Estimating survival fitness of viral variant” to better describe how the data of the model was sourced (Lines 465-499).

      To add to the reviewer’s point, we agree that sequence variants common to many individuals are likely to be well-tolerated by the virus and this event was observed in our findings as our data suggested that immune escape variants tended to revert to variants that were closer the global consensus strain. Our previous publications have indicated that T/F viruses during transmission were variants that were “fit” for transmission between hosts, especially in cases where the donor was a chronic progressor, a single T/F is often observed. Progression to immune escape and adaptation to chronic infection in the new host has an in-between process of genetic expansion via replication followed by a bottleneck event under immune pressure where overall fitness (overall survivability including replication and exploring immune escape pathways) can change. Under this assumption we questioned whether the observation reported in HIV studies (i.e. mutation landscapes that allow HIV adaptation to host) also happens in HCV infections. Furthermore, cohort used in this study is a rare cohort where patients were tracked from uninfected, to HCV RNA+, to seroconversion and finally either clearing the virus or progression to chronic infection. Thus, it is of importance to understand the difference between clearance and chronic progression.

      Another important point for clarification is the definition of fitness. In the abstract, the authors note that multiple studies have shown that viral escape variants can have reduced fitness, "diminishing the survival of the viral strain within the host, and the capacity of the variant to survive future transmission events." It would be helpful to distinguish between this notion of fitness, which has sometimes been referred to as "intrinsic fitness," and a definition of fitness that describes the success of different viral strains within a particular individual, including the potential benefits of immune escape. In many cases, escape variants displace variants without escape mutations, showing that their ability to survive and replicate within a specific host is actually improved relative to variants without escape mutations. However, escape mutations may harm the virus's ability to replicate in other contexts. Given the major role that fitness plays in this paper, it would be helpful for readers to clearly discuss how fitness is defined and to distinguish between fitness within and between hosts (potentially also mentioning relevant concepts such as "transmission fitness," i.e., the relative ability of a particular variant to establish new infections).

      Thank you for pointing out the weakness of our definition of fitness. We have now clarified this at multiple sections of the paper: In the abstract at lines 18-21 and in the introduction at lines 64-69.

      These read:

      Lines 18-21: “However, this generic definition can be further divided into two categories where intrinsic fitness describes the viral fitness without the influence of any immune pressure and effective fitness considers both intrinsic fitness with the influence of host immune pressure.”

      Lines 64-69: “This generic definition of fitness can be further divided into intrinsic fitness (also referred to as replicative fitness), where the fitness of sequence composition of the variant is estimated without the influence of host immune pressure. On the other hand, effective fitness (from here on referred to as viral fitness) considers fundamental intrinsic fitness with host immune pressure acting as a selective force to direct mutational landscape (19)[REF], which subsequently influences future transmission events as it dictates which subvariants remain in the quasispecies.”

      One concern about the analysis is in the test of Shannon entropy as a way to quantify the rate of escape. The authors describe computing the entropy at multiple time points preceding the time when escape mutations were observed to fix in a particular epitope. Which entropy values were used to compare with the escape rate? If just the time point directly preceding the fixation of escape mutations, could escape mutations have already been present in the population at that time, increasing the entropy and thus drawing an association with the rate of escape? It would also be helpful for readers to include a definition of entropy in the methods, in addition to a reference to prior work. For example, it is not clear what is being averaged when "average SE" is described.

      We thank the reviewer to point out the ambiguity in describing average SE. This has been rectified by adding more information in the methods section (Lines 397 to 400):

      “Briefly, SE was calculated using the frequency of occurrence of SNPs based on per codon position, this was further normalized by the length of the number of codons in the sequence which made up respective protein. An average SE value was calculated for each time point in each protein region for all subjects until the fixation event.”

      To answer the reviewer’s question, we computed entropy at multiple time points preceding the observation in the escape mutation. The escape rate was calculated for the epitopes targeted by immune response. We compared the average SE based on change of each codon position and then normalised by protein length, where the region contained the epitope and the time it took to reach fixation. We observed that if the protein region had a higher rate of variation (i.e. higher average SE) then we also see a quicker emergence of an immune escape epitope. Since we took SE from the very first time point and all subsequent time points until fixation, we do not think that escape mutations already been present at the population would alter the findings of the association with rate of escape. Especially, these escape mutations were rarely observed at early time points. It is likely that due to host immune pressure that the escape variant could be observed, the SE therefore suggest the liberty of exploration in the mutation landscape. If the region was highly restrictive where any mutations would result in a failed variant, then we should observe relatively lower values of average SE. In other words, the higher variability that is allowed in the region, the greater the probability that it will find a solution to achieve immune escape.

      Reviewer #2 (Recommendations for the authors):

      In addition to the main points above, there are a few minor comments and suggestions about the presentation of the data.

      (1) It's not clear how, precisely, the model-based fitness has been calculated and normalized. It would be helpful for the authors to describe this explicitly. Especially in Figure 3, the plotted fitness values lie in dramatically different ranges, which should be explained (maybe this is just an error with the plot?).

      We have now clarified how the model-based fitness has been calculated and normalized in the method section “Estimating survival fitness of viral variants” at line 465-472.

      “The model used for estimating viral fitness has been previously described by Hart et al. (19). Briefly, the original approach used HCV subtype 1a sequences to generate the model for the NS5B protein region. To update the model for other regions (NS3 and NS2) as well as other HCV subtypes in this study, subtype 1b and subtype 3a sequences were extracted from the Los Almos National Laboratory HCV database. An intrinsic fitness model was first generated for each subtype for NS5B, NS3 and NS2 region of the HCV polyprotein. Then using, longitudinally sequenced data from patients chronically infected with HCV as well as clinically documented immune escape to describe high viral fitness variants, we generated estimates of the viral fitness for subjects chronically infected with HCV in our cohort.”

      Our apologies, there was an error with the plot in Figure 3. This has now been resolved.

      (2) In different plots, the authors show every pairwise comparison of ELISPOT values, population fitness, average SE, and rate of escape. It may be helpful to make one large matrix of plots that shows all of these pairwise comparisons at the same time. This could make it clear how all the variables are associated with one another. To be clear, this is a suggestion that the authors can consider at their discretion.

      Thank you for the suggestion to create a matrix of plots for pairwise comparisons. While this approach could indeed clarify variable associations, implementing it is outside the scope of this project. We appreciate the idea and may consider it in future studies as we continue to expand on this work.

    1. Author response:

      We have reviewed the helpful feedback from the reviewers and would like to thank them for their careful consideration of our manuscript. By way of provisional response, we agree with many of the above points and plan to revise our manuscript accordingly.

      In an effort to replicate some of the heme trafficking-related experiments in the original paper using a C. elegans model of TDD, we were either unable to do so or demonstrated an alternative explanation for the findings we could partially reproduce. As the reviewers correctly point out, there were some methodological and reagent-related differences between the study by Sun et al. and our own that we will more directly highlight in a subsequent manuscript version. Additionally, where possible, we will attempt to replicate these experiments using the same protocol(s).

      We observed several phenotypic traits observed in the C. elegans model of TDD that were not previously described in prior studies. While we believe these features to be consistent with a bioenergetic problem in the worm, direct evidence for this is admittedly lacking in our original manuscript. We are actively engaged in experiments examining potential functions of HRG-9 and HRG-10 unrelated to heme trafficking and will consider which data best aligns with the scope of this study, thus warranting inclusion in a subsequent manuscript version. We will also provide a more comprehensive review of relevant data generated by other groups (e.g., lipid dysregulation, impaired autophagy, mitochondrial dysfunction in the absence of TANGO2) in the discussion section.

      Recommended improvements related to figure legends, terminology, and formatting will also be executed in our forthcoming version. On behalf of my co-authors and myself, thank you again for your time and effort improving this work.

    1. Author response:

      We thank both reviewers for their time and effort in considering our manuscript. We are pleased that the reviewers recognised the strength of our theoretical analysis and found it "elegant" and "reasonably accessible". We also acknowledge the suggestions made by both reviewers that the manuscript could be improved by more discussion of potential experiments. We were concerned not to make the original manuscript too long but, in the light of the reviewers' comments, we will submit a revised version with more details of the kinds of experiments that would build on the results that we have presented.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The present study aims to associate reproduction with age-related disease as support of the antagonistic pleiotropy hypothesis of ageing, predominantly using Mendelian Randomization. The authors found evidence that early-life reproductive success is associated with advanced ageing.

      Strengths:

      Large sample size. Many analyses.

      Weaknesses:

      There are some errors in the methodology, that require revisions.

      In particular, the main conclusions drawn by the authors refer to the Mendelian Randomization analyses. However, the authors made a few errors here that need to be reconsidered:

      (1) Many of the outcomes investigated by the authors are continuous outcomes, while the authors report odds ratios. This is not correct and should be revised.

      Thank you for your observation. We have revised the manuscript to ensure that the results for continuous outcomes are appropriately reported using beta coefficients, which indicate the change in the outcome per unit increase in exposure. This will accurately reflect the nature of the analysis and provide a clearer interpretation of continuous outcomes (lines 56-109).

      (2) Some of the odds ratios (for example the one for osteoporosis) are really small, while still reaching the level of statistical significance. After some checking, I found the GWAS data used to generate these MR estimates were processed by the program BOLT-LLM. This program is a linear mixed model program, which requires the transformation of the beta estimates to be useful for dichotomous outcomes. The authors should check the manual of BOLT-LLM and recalculate the beta estimates of the SNP-outcome associations prior to the Mendelian Randomization analyses. This should be checked for all outcomes as it doesn't apply to all.

      Thank you for your detailed feedback. We have reviewed all the GWAS data used in our MR analyses and confirmed that all GWAS of continuous traits have already been processed using the BOLT-LMM, including age at menarche, age at first birth, BMI, frailty index, father's age at death, mother's age at death, DNA methylation GrimAge acceleration, age at menopause, eye age, and facial aging. Most of the dichotomous outcomes have not been processed by BOLT-LMM, including late-onset Alzheimer's disease, type 2 diabetes, chronic heart failure, essential hypertension, cirrhosis, chronic kidney disease, early onset chronic obstructive pulmonary disease, breast cancer, ovarian cancer, endometrial cancer, and cervical cancer, except osteoporosis. We have reprocessed the GWAS beta values of osteoporosis and re-conducted the MR analysis (lines 74-75; lines 366-373).

      (3) The authors should follow the MR-Strobe guidelines for presentation.

      Thank you for your suggestion to follow the MR-STROBE guidelines for the presentation of our study. We appreciate the importance of adhering to these standardized guidelines to ensure clarity and transparency in reporting Mendelian Randomization (MR) analyses. We confirm that the MR components of our research are structured and presented following the MR-STROBE checklist. In addition to the MR analyses, our study also integrates Colocalization analysis, Genetic correlation analysis, Ingenuity Pathway Analysis (IPA), and population validation to provide a more comprehensive understanding of the genetic and biological context. While these analyses are not strictly covered by MR-STROBE guidelines, they complement the MR results by offering additional validation and mechanistic insights.

      We have structured our manuscript to separate these complementary analyses from the core MR results, maintaining alignment with MR-STROBE for the MR-specific components. The additional analyses are discussed in dedicated sections to highlight their unique contributions and avoid conflating them with the MR findings.

      (4) The authors should report data in the text with a 95% confidence interval.

      Thank you for your feedback. We have added the 95% confidence intervals for the reported data within the main text to enhance clarity and provide comprehensive context (lines 56-109). Additionally, the complete analysis data, including all detailed results, can be found in Table S3.

      (5) The authors should consider correction for multiple testing

      Thank you for your comment regarding the need to consider correction for multiple testing. We agree that correcting for multiple comparisons is an important step to control for the possibility of false-positive findings, particularly in studies involving large numbers of statistical tests. In our study, we carefully considered the issue of multiple testing and adopted the following approach:

      Context of Multiple Testing: The tests we conducted were hypothesis-driven, focusing on specific relationships (e.g., genetic correlation, colocalization, and Mendelian Randomization). These analyses are based on priori hypotheses supported by existing literature or biological relevance.

      Statistical Methods: Where applicable, we applied appropriate measures to account for multiple tests. For instance, in Mendelian Randomization, sensitivity analyses serve to validate the robustness of the results.

      We believe that the methodology and corrections applied in our study appropriately address concerns about multiple testing, given the hypothesis-driven nature of our analyses and the rigorous steps taken to validate our findings. If you feel that additional corrections are required for specific parts of the analysis, we would be happy to further clarify or revise as needed.

      Reviewer #2 (Public review):

      Summary:

      The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth have a positive causal effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging, and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identified 128 fertility-related SNPs that are associated with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.

      Strengths:

      The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.

      Points that have to be clarified/addressed:

      (1) The antagonistic pleiotropy is an evolutionary theory pointing to the possibility that mutations that are beneficial for fitness (early life health and reproduction) may be detrimental later in life. As it concerns an evolutionary process and the authors focus on contemporary data from a single generation, more context is necessary on how this theory is accurately testable. For example, why and how much natural variation is there for fitness outcomes in humans?

      Thank you for these insightful questions. We appreciate the opportunity to clarify how we approach the testing of AP theory within a contemporary human cohort and address the evolutionary context and comparative considerations with the disposable soma theory.

      We recognize that modern human populations experience selection pressures that differ from those in the past, which may affect how well certain genetic variants reflect historical fitness benefits. Nonetheless, the genetic variation present today still offers valuable insights into potential AP mechanisms through statistical associations in contemporary cohorts. We believe that AP can indeed be explored in current populations by examining genetic links between reproductive traits and age-related health outcomes. In our study, we investigate whether certain genetic variants linked to reproductive timing—such as age at menarche and age at first birth—also correlate with late-life health risks. By identifying SNPs associated with both early-life reproductive success and adverse aging outcomes, we aim to capture the evolutionary trade-offs that AP theory suggests.

      Despite contemporary selection pressures that differ from historical conditions, there remains natural genetic variation in traits like reproductive timing and longevity in humans today. This diversity allows us to apply MR to test causal relationships between reproductive traits and aging outcomes, providing insights into potential AP mechanisms. Prior studies have demonstrated that reproductive behaviors exhibit significant heritability and have identified genetic loci associated with reproductive timing (1,2). This genetic variation facilitates causal inference in modern cohorts, despite environmental and healthcare advances that might modulate these associations (3). By leveraging genetic risk scores for reproductive timing, our study captures the necessary variability to assess potential AP effects, thus providing valuable insights into how evolutionary trade-offs may continue to influence human health outcomes.

      How do genetic risk score distributions of the exposure data look like?

      Thank you for your question. Our study is focused on Mendelian Randomization (MR) analysis, which aims to infer causal relationships between exposures and outcomes. While genetic risk scores (GRS) provide valuable insights at an individual level, they do not directly align with our study's objective, which is centered on population-level causal inference rather than individual-level genetic risk assessment. In MR, we use genetic variants as instrumental variables to determine the causal effect of an exposure on an outcome. GRS analysis typically focuses on summarizing an individual's risk based on multiple genetic variants, which is outside the scope of our current research. Therefore, we did not perform or analyze the distribution of genetic risk scores, as our primary goal was to understand broader causal relationships using established genetic instruments.

      Also, how can the authors distinguish in their data between the antagonistic pleiotropy theory and the disposable soma theory, which considers a trade-off between investment in reproduction and somatic maintenance and can be used to derive similar hypotheses? There is just a very brief mention of the disposable soma theory in lines 196-198.

      In our manuscript, we test AP theory specifically by examining genetic variants associated with reproductive timing and their association with age-related health risks in later life. MR and genetic risk scores allow us to assess these associations, directly testing the hypothesis that certain alleles enhancing reproductive success might have adverse effects on aging outcomes. This gene-centered approach aligns with AP’s premise of genetic trade-offs, enabling us to observe whether alleles associated with early-life reproductive traits correlate with increased risks of age-related diseases. Distinguishing from disposable soma theory, which would predict a general trade-off in energy allocation affecting somatic maintenance and not specific genetic effects, our data focuses on how certain alleles have differential impacts across life stages. Our findings thus support AP theory over disposable soma by highlighting the effects of specific genetic loci on both reproductive and aging phenotypes. However, future research could indeed explore the intersection of these theories, for example, by examining how resource allocation and genetic predispositions interact to influence longevity in various environmental contexts.

      (2) The antagonistic pleiotropy theory, used to derive the hypothesis, does not necessarily distinguish between male and female fitness. Would the authors expect that their results extrapolate to males as well? And can they test that?

      Emerging evidence suggests that early puberty in males is linked to adverse health outcomes, such as an increased risk of cardiovascular disease, type 2 diabetes, and hypertension in later life (4). A Mendelian randomization study also reported a genetic association between the timing of male puberty and reduced lifespan (5). These findings support the hypothesis that genetic variants associated with delayed reproductive timing in males might similarly confer health benefits or improved longevity, akin to the patterns observed in females. This would suggest that similar mechanisms of antagonistic pleiotropy could operate in males as well.

      In our study, BMI was identified as a mediator between reproductive timing and disease risk. Given that BMI is a common risk factor for age-related diseases in both males and females (6-9), it is plausible that similar mechanisms involving BMI, reproductive timing, and disease risk could exist in males. This shared mediator points to the possibility that, while reproductive timelines may differ, the pathways through which these traits influence aging outcomes may be consistent across genders.

      AP theory could potentially be tested in males, as the principles of the theory may extend to analogous reproductive traits in males, such as age at puberty and testosterone levels, which could similarly influence health outcomes later in life. However, as our current study focuses specifically on female reproductive traits, testing the AP theory in males is outside the scope of this work. We acknowledge the importance of exploring these mechanisms in males, and we hope that future research will address this by investigating male-specific reproductive traits and their relationship to aging and health outcomes.

      (3) There is no statistical analyses section providing the exact equations that are tested. Hence it's not clear how many tests were performed and if correction for multiple testing is necessary. It is also not clear what type of analyses have been done and why they have been done. For example in the section starting at line 47, Odds Ratios are presented, indicating that logistic regression analyses have been performed. As it's not clear how the outcomes are defined (genotype or phenotype, cross-sectional or longitudinal, etc.) it's also not clear why logistic regression analysis was used for the analyses.

      Thank you for your thoughtful comments regarding the statistical analyses and the clarification of methods and variables used in the study.

      Statistical Analyses Section: We have included a detailed explanation of all statistical analyses in the Methods section (lines 291–408), specifying the rationale for the choice of methods, the variables analyzed, and their relationships. Additionally, we have provided the relevant equations or statistical models used where appropriate to ensure transparency.

      Beta Values and Odds Ratios: In the Results section (starting at line 56), both Beta values and Odds Ratios are presented: Beta values were used for analyses of continuous outcomes to quantify the linear relationship between predictors and outcomes. Odds Ratios (ORs) were calculated for binary or categorical disease outcomes to describe the relative odds of an outcome given specific exposures or independent variables.

      Validation and Regression Analyses: For further validation of the MR results, we conducted analyses using the UK Biobank dataset (starting at line 162). Logistic regression analysis was then employed for disease risk assessments involving categorical outcomes (e.g., diseased or not).

      We hope that this clarifies the methods and their applicability to our study, as well as the rationale for the presentation of Beta values and Odds Ratios. If further details or refinements are required, we are happy to incorporate them.

      (4) Mendelian Randomization is an important part of the analyses done in the manuscript. It is not clear to what extent the MR assumptions are met, how the assumptions were tested, and if/what sensitivity analyses are performed; e.g. reverse MR, biological knowledge of the studied traits, etc. Can the authors explain to what extent the genetic instruments represent their targets (applicable expression/protein levels) well?

      Thank you for your insightful comments regarding the Mendelian Randomization (MR) analysis and the evaluation of its assumptions. Below, we provide additional clarification on how the MR assumptions were addressed, sensitivity analyses performed, and the representativeness of the genetic instruments (starting at line 314):

      Relevance Assumption (Genetic instruments are associated with the exposure): “We identified single nucleotide polymorphisms (SNPs) associated with exposure datasets with p < 5 × 10<sup>-8</sup> (10,11). In this case, 249 SNPs and 67 SNPs were selected as eligible instrumental variables (IVs) for exposures of age at menarche and age at first birth, respectively. All selected SNPs for every exposure would be clumped to avoid the linkage disequilibrium (r<sup>2</sup> = 0.001 and kb = 10,000).” “During the harmonization process, we aligned the alleles to the human genome reference sequence and removed incompatible SNPs. Subsequent analyses were based on the merged exposure-outcome dataset. We calculated the F statistics to quantify the strength of IVs for each exposure with a threshold of F>10 (12).”

      Independence Assumption (Genetic instruments are not associated with confounders, Genetic instruments affect the outcome only through the exposure): Then we identified whether there were potential confounders of IVs associated with the outcomes based on a database of human genotype-phenotype associations, PhenoScanner V2 (13,14) (http://www.phenoscanner.medschl.cam.ac.uk/), with a threshold of p < 1 × 10<sup>-5</sup>. IVs associated with education, smoking, alcohol, activity, and other confounders related to outcomes would be excluded.

      Sensitivity Analyses Performed: A pleiotropy test was used to check if the IVs influence the outcome through pathways other than the exposure of interest. A heterogeneity test was applied to ensure whether there is a variation in the causal effect estimates across different IVs. Significant heterogeneity test results indicate that some instruments are invalid or that the causal effect varies depending on the IVs used. MRPRESSO was applied to detect and correct potential outliers of IVs with NbDistribution = 10,000 and threshold p = 0.05. Outliers would be excluded for repeated analysis. The causal estimates were given as odds ratios (ORs) and 95% confidence intervals (CI). A leave-one-out analysis was conducted to ensure the robustness of the results by sequentially excluding each IV and confirming the direction and statistical significance of the remained remaining SNPs.

      Supplemental post-GWAS analysis: Colocalization analysis (starting at line 356), Genetic correlation analysis (starting at line 366).

      Our MR analysis adheres to the guidelines for causal inference in MR studies. By combining multiple sensitivity analyses and ensuring the quality of genetic instruments, we demonstrate that the results are robust and unlikely to be driven by confounding or pleiotropy.

      (5) It is not clear what reference genome is used and if or what imputation panel is used. It is also not clear what QC steps are applied to the genotype data in order to construct the genetic instruments of MR.

      Starting in line 314, the steps of SNPs selection were included in the Methods part. “We identified single nucleotide polymorphisms (SNPs) associated with exposure datasets with p < 5 × 10<sup>-8</sup> (10,11). In this case, 249 SNPs and 67 SNPs were selected as eligible instrumental variables (IVs) for exposures of age at menarche and age at first birth, respectively. All selected SNPs for every exposure would be clumped to avoid the linkage disequilibrium (r<sup>2</sup> = 0.001 and kb = 10,000). Then we identified whether there were potential confounders of IVs associated with the outcomes based on a database of human genotype-phenotype associations, PhenoScanner V2 (13,14) (http://www.phenoscanner.medschl.cam.ac.uk/), with a threshold of p < 1 × 10<sup>-5</sup>. IVs associated with education, smoking, alcohol, activity, and other confounders related to outcomes would be excluded. During the harmonization process, we aligned the alleles to the human genome reference sequence and removed incompatible SNPs. Subsequent analyses were based on the merged exposure-outcome dataset. We calculated the F statistics to quantify the strength of IVs for each exposure with a threshold of F>10 (12). If the effect allele frequency (EAF) was missing in the primary dataset, EAF would be collected from dsSNP (https://www.ncbi.nlm.nih.gov/snp/) based on the population to calculate the F value.” The SNP numbers of exposures for each outcome and F statistics results were listed in supplemental table S2.

      (6) A code availability statement is missing. It is understandable that data cannot always be shared, but code should be openly accessible.

      We have added it to the manuscript (starting at line 410).

      Reviewer #2 (Recommendations for the authors):

      (1) The outcomes seem to be genotypes (lines 274-288). In MR, genotypes are used as an instrument, representing an exposure, which is then associated with an outcome that is typically observed and measured at a later moment in time than the predictors. If both exposure and outcome are genotypes it is not clear how this works in terms of causality; it would rather reflect a genetic correlation. One would expect the genotypes that function as instruments for the exposure to have a functional cascade of (age-related) effects, leading to an (age-related) outcome. From line 149 the outcomes seem to be phenotypes. Can the authors please clearly explain in each section what is analyzed, how the analyses were done, and why the analyses were done that way?

      Thank you for your insightful comment. We understand the concern regarding the use of genotypes as both exposures and outcomes and the implications this has for interpreting causality versus genetic correlation. To clarify, in our study, the outcomes analyzed in the MR framework are indeed genotypes, starting from line 47. We use genotypes as instrumental variables for exposures, which are then linked to phenotypic outcomes observed at a later stage, in line with standard MR principles.

      To improve the robustness of the MR results, we validated the genetic associations in the population with phenotype data from UK Biobank (lines 162-203), and the detailed methods were listed in lines 385-408.

      (2) Overall, the English writing is good. However, some small errors slipped in. Please check the manuscript for small grammar mistakes like in sentences 10 (punctuation) and 33 (grammar).

      Thank you for your feedback. We appreciate your careful review and attention to detail. We thoroughly rechecked the manuscript for any grammatical errors, including punctuation and sentence structure, especially in sentences 11 and 35 in revised manuscript, as suggested.

      (3) There is currently no results and discussion section.

      The manuscript was submitted as Short Reports article type with a combined Results and Discussion section. We have added the section title of Discussion.

      (4) Why did the authors not include SNPs associated with age at menopausal onset? See for example: https://www.nature.com/articles/s41586-021-03779-7https://urldefense.com/v3/__https://www.nature.com/articles/s41586-021-03779-7__;!!HYjtAOY1tjP_!Kl_ZKCmWOQEnvEbl46TG0TuhlsxapwvFdAFfZJkMvz8z7XhX5VEA1cT8CVvNu8xrv9k679Kl0XTrxwSajUeiXWm04XP4$.

      Thank you for your information. Our manuscript focuses on the antagonistic pleiotropy theory, which posits that inherent trade-off in natural selection, where genes beneficial for early survival and reproduction (like menarche and childbirth) may have costly consequences later. So, we only included age at menarche and age at first childbirth as exposures in our research.

      (5) Can the authors include genetic correlations between menarche, age at first child, BMI, and preferably menopause?

      Thank you for your suggestion. We acknowledge that including genetic correlations between age at menarche, age at first childbirth, BMI, and menopause can provide valuable context to our analysis. While our current MR study sets age at menarche and age at first childbirth as exposures and menopause as the outcome, and we have already included results that account for BMI-related SNPs before and after correction, we recognize the importance of assessing genetic correlations.

      To address this, we calculated the genetic correlations between these traits to provide insight into their shared genetic architecture. This analysis helps clarify whether there is a significant genetic overlap between the two exposures and between exposure and outcome, which can inform and support the interpretation of our MR results. We appreciate your suggestion and include these calculations to enhance the robustness and comprehensiveness of our study. In the genetic correlations analysis, LDSC software was applied and the genetic correlation values for all pairwise comparisons among age at menarche, age at first birth, BMI, and age at menopause onset were calculated(15,16). The results are listed in Table S6.

      (6) Line 39-40: that is not entirely true. There is also amounting evidence that socioeconomic factors cause earlier onset of menarche through stress-related mechanisms: https://doi.org/10.1016/j.annepidem.2010.08.006https://urldefense.com/v3/__https://doi.org/10.1016/j.annepidem.2010.08.006__;!!HYjtAOY1tjP_!Kl_ZKCmWOQEnvEbl46TG0TuhlsxapwvFdAFfZJkMvz8z7XhX5VEA1cT8CVvNu8xrv9k679Kl0XTrxwSajUeiXZ4vbX0y$

      Thank you so much for your information. We changed it to “Considering reproductive events are partly regulated by genetic factors that can manifest the physiological outcome later in life”.

      (7) Why did the authors choose to work with studies derived from IEU Open GWAS? as it is often does not contain the most recent and relevant GWAS for a specific trait.

      We chose to work with studies derived from the IEU Open GWAS database after careful consideration of several sources, including the GWAS Catalog database and recently published GWAS papers. Our selection criteria focused on publicly available GWAS with large sample sizes and a higher number of SNPs to ensure robust analysis. For specific traits such as late-onset Alzheimer's disease and eye aging, we used GWAS data published in scientific articles to ensure that our research reflects the latest findings in the field.

      (1) Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat Genet 48, 1462-1472 (2016). https://doi.org/10.1038/ng.3698

      (2) Tropf, F. C. et al. Hidden heritability due to heterogeneity across seven populations. Nat Hum Behav 1, 757-765 (2017). https://doi.org/10.1038/s41562-017-0195-1

      (3) Stearns, S. C., Byars, S. G., Govindaraju, D. R. & Ewbank, D. Measuring selection in contemporary human populations. Nat Rev Genet 11, 611-622 (2010). https://doi.org/10.1038/nrg2831

      (4) Day, F. R., Elks, C. E., Murray, A., Ong, K. K. & Perry, J. R. Puberty timing associated with diabetes, cardiovascular disease and also diverse health outcomes in men and women: the UK Biobank study. Sci Rep 5, 11208 (2015). https://doi.org/10.1038/srep11208

      (5) Hollis, B. et al. Genomic analysis of male puberty timing highlights shared genetic basis with hair colour and lifespan. Nat Commun 11, 1536 (2020). https://doi.org/10.1038/s41467-020-14451-5

      (6) Field, A. E. et al. Impact of overweight on the risk of developing common chronic diseases during a 10-year period. Arch Intern Med 161, 1581-1586 (2001). https://doi.org/10.1001/archinte.161.13.1581

      (7) Singh, G. M. et al. The age-specific quantitative effects of metabolic risk factors on cardiovascular diseases and diabetes: a pooled analysis. PLoS One 8, e65174 (2013). https://doi.org/10.1371/journal.pone.0065174

      (8) Kivimaki, M. et al. Obesity and risk of diseases associated with hallmarks of cellular ageing: a multicohort study. Lancet Healthy Longev 5, e454-e463 (2024). https://doi.org/10.1016/S2666-7568(24)00087-4

      (9) Kivimaki, M. et al. Body-mass index and risk of obesity-related complex multimorbidity: an observational multicohort study. Lancet Diabetes Endocrinol 10, 253-263 (2022). https://doi.org/10.1016/S2213-8587(22)00033-X

      (10) Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat Genet 50, 912-919 (2018). https://doi.org/10.1038/s41588-018-0152-6

      (11) Gao, X. et al. The bidirectional causal relationships of insomnia with five major psychiatric disorders: A Mendelian randomization study. Eur Psychiatry 60, 79-85 (2019). https://doi.org/10.1016/j.eurpsy.2019.05.004

      (12) Burgess, S., Small, D. S. & Thompson, S. G. A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res 26, 2333-2355 (2017). https://doi.org/10.1177/0962280215597579

      (13) Staley, J. R. et al. PhenoScanner: a database of human genotype-phenotype associations. Bioinformatics 32, 3207-3209 (2016). https://doi.org/10.1093/bioinformatics/btw373

      (14) Kamat, M. A. et al. PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations. Bioinformatics 35, 4851-4853 (2019). https://doi.org/10.1093/bioinformatics/btz469

      (15) Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236-1241 (2015). https://doi.org/10.1038/ng.3406

      (16) Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291-295 (2015). https://doi.org/10.1038/ng.3211

    1. Author response:

      We thank the reviewers for their thoughtful comments and suggestions. We plan to make a number of revisions to the manuscript to address their feedback.

      Firstly, we plan to incorporate feedback related to our modeling approach. We will provide justification for the chosen models and why this dataset is not appropriate for an in-depth exploration of other models. In particular, we will highlight that the models included in this manuscript were taken from Langdon et al. (2019) with a minor extension. Model development and validation in the Langdon et al. (2019) paper required a dataset with >100 rats per task. As the current n per variant is 28-32, and behavioral performance on this task is highly variable, it would be difficult to sufficiently test the validity of models that majorly depart from the previously tested RL models. Nevertheless, we will acknowledge this as a limitation in the discussion section. Additionally, we will test some alternatives suggested by reviewers that fall within the scope of the current RL modeling framework (e.g., comparison to a standard delta-rule update for unrewarded choices). We will address other concerns brought up by reviewers by a.) providing a rationale for why we constrained our analyses to the first five sessions, b.) simulating data for sessions that match those that were analyzed in the real data (i.e., sessions 35-40 instead of 18-20), and c.) including a figure of the simulated choice probabilities rather than just risk score.

      Secondly, we will include additional analyses and clarify the current statistical approach to address comments on how the data were analyzed. We will include an analysis of task acquisition to investigate when choice preferences emerge across the different variants. We will justify the statistical approach used for detecting behavioral differences between task variants, including a better explanation of the inclusion of the risky/optimal label as a between-subjects factor in the ANOVAs. We will also expand the section on parameters predicting risk preference on the rGT to fully explain the statistical method used and provide a figure of the results.

      Lastly, we will provide a more detailed rationale for the reinforcer devaluation test, and describe the hypothesis it tests. We will also expand on how the results from the devaluation test support our conclusions, and address alternative explanations suggested by the reviewers.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1: 

      (1) As discussed in review and nicely simulated by the authors, the large figure error indicated by profilometry (~10 um in some cases on average) is inconsistent with the optical performance improvements observed, suggesting that those measurements are inaccurate.

      I see no reason to include these inaccurate measurements.  

      We agree with the Referee and removed the indicated figure (old Supplementary Fig. 4) and data.

      Reviewer #3:

      (1) It would be interesting to comment on how the addition of a coverslip changes the performance of the uncorrected microendoscope compared to the use of bare grin lenses. 

      We modified the discussion section (page 18) and added a new reference (#36) to include the request of the Referee.

      (2) In Figure 6C-H, the authors can indeed show data corresponding to all detected cells, but I still think that the statistics should be calculated using the same effective FOV. 

      We modified Figure 6 legend to include the request of the Referee.

      (3) Authors could present the images in Figures 4-6 as in the original version, with a scale bar in the centre of the FOV that is different for the two types of objectives (corrected vs uncorrected). They could add a short justification for this choice, and perhaps present the other version for Figure 4 in a supplementary information sheet (with similar scale bars at the centre of the FOV for both types of objectives). It would allow readers to appreciate that the FOV still appears significantly enlarged with this other presentation.

      As requested by the Referee, we modified the text in the Result section (page 11) and added the additional version of Figure 4 as Figure 4-figure supplement 1.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study presents potentially valuable insights into the role of climbing fibers in cerebellar learning. The main claim is that climbing fiber activity is necessary for optokinetic reflex adaptation, but is dispensable for its long-term consolidation. There is evidence to support the first part of this claim, though it requires a clearer demonstration of the penetrance and selectivity of the manipulation. However, support for the latter part of the claim is incomplete owing to methodological concerns, including unclear efficacy of longer-duration climbing fiber activity suppression.

      We sincerely appreciate the thoughtful feedback provided by the reviewer regarding our study on the role of climbing fibers in cerebellar learning. Each point raised has been carefully considered, and we are committed to addressing them comprehensively. We acknowledge the importance of addressing methodological concerns, particularly regarding the efficacy of long-term suppression of CF activity, as well as ensuring clarity regarding the penetrance and selectivity of our manipulation. To this end, we have outlined plans for substantial revisions to the manuscript to adequately address these issues.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their longterm activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminshed by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17. ), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning cannot be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

      We appreciate the reviewer’s thorough evaluation, which thoughtfully highlights the strengths and areas for improvement in our study.

      We agree with the reviewer’s recognition of the novelty of our approach, particularly in specifically perturbing climbing fiber (CF) activity in the flocculus and examining its effects across distinct phases of learning. Additionally, our use of the well-established OKR behavior paradigm provides a robust framework for investigating cerebellar learning processes, further strengthening our study.

      To address concerns regarding the efficacy of long-term optogenetic inhibition and the specificity of viral targeting, we conducted additional experiments. These include in vivo monitoring of CF activity during the irradiation period, confirming sustained inhibition of complex spikes throughout the consolidation phase. To ensure precise targeting and mitigate potential side effects, such as unintended modification of Purkinje cell (PC) simple spike activity, we demonstrated that optogenetic suppression of CF transmission did not affect simple spike firing. Furthermore, we made additional characterizations to confirm the specificity of viral targeting.

      Lastly, we recognize the importance of exploring alternative mechanisms underlying CF involvement in cerebellar learning. Accordingly, we expanded the manuscript to provide a more comprehensive discussion of these mechanisms, offering a clearer perspective on the broader implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      We appreciate the reviewer’s recognition of the significance of our study in addressing the fundamental question of the role of CF in adaptive learning within the cerebellar field. The use of optogenetic tools indeed provides a direct means to investigate the causal relationship between CF activity and learning outcomes.

      To address concerns regarding the effectiveness of CF suppression during consolidation, we plan to conduct further in-vivo recordings. These will demonstrate how reliably CF transmission can be suppressed through optogenetic manipulation over an extended period.

      In response to the concern about potential tissue damage from laser stimulation, we believe that our optogenetic manipulation was not strong enough to induce significant heat-induced tissue damage in the flocculus. According to Cardin et al. (2010), light applied through an optic fiber may cause critical damage if the intensity exceeds 100 mW, which is eight times stronger than the intensity we used in our OKR experiment. Furthermore, if there had been tissue damage from chronic laser stimulation, we would expect to see impaired long-term memory reflected in abnormal gain retrieval results tested the following day. However, as shown in Figures 2 and 3, there were no significant abnormalities in consolidation percentages even after the optogenetic manipulation.

      Finally, we appreciate the reviewer’s recognition of the challenges involved in pinpointing specific neural mechanisms. We plan to expand the discussion to address these complexities and outline future research directions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Inhibitory optogenetic actuators are generally problematic, especially in time frames longer than seconds. If the authors wish to be able to inhibit activity in the flocculus-targeting CFs for a long time, maybe it would make sense to try to retrogradely transfect the IO neurons from the flocculus (using a cre-lox approach) with inhibitory DREADDs. This approach is also full of problems, so the absence or significant decrease in CS activity throughout the period of manipulation must be demonstrated.

      In addition to re-examining the strength of the evidence regarding the role of CFs in the consolidation and retrival phases, the manuscript would benefit from significant reworking of the details in the manuscript and figures. Below is a possibly incomplete list of things we would want to highlight:

      (1) While the text states the authors "... verified the potential reduction of Cs firing rate in PCs of awake mice in vivo by inhibiting CF signals", the data nor a figure are shown. This is of critical importance when judging the reliability of the following results. The data presented in panels Figure 1D-E should also be improved to be more informative, specifically, the waveforms of EPSCs should be shown in higher resolution. We are not informed about how many cells/slices/animals the results are obtained from, nor how many trials were done per condition. Finally, the in vitro data is from vermal Purkinje neurons, while the focus of the work is in the flocculus. Please provide these verifications for the flocculus.

      To verify the suppression of complex spike (Cs) activity, we conducted additional in-vivo experiments and added Figure 2, which presents recordings of Cs firing rates from Purkinje cells (PCs) during optogenetic suppression of climbing fiber (CF) activity. These data demonstrate that the suppression specifically and robustly targets Cs activity without affecting simple spike firing, as shown in Figure 2C. The results presented in Figure 2 were acquired at 40 minutes of optostimulation, consistently showing effective suppression of Cs activity throughout this period. While continuous recordings over several hours were not performed, the stability and sustained suppression observed at the 40-minute mark strongly suggest that the manipulation remains effective during the extended durations required for the behavioral tests.

      Additionally, we have improved Figure 1D by enhancing the resolution of EPSC waveforms and including more detailed information in the figure legend regarding the number of cells and animals analyzed. For the current-clamp mode data (Figures 1E and F), we clarified the experimental conditions to provide additional context. While the in vitro data were collected from vermal PCs, these experiments were intended to illustrate the fundamental properties of CF-PC transmission.

      (2) It is challenging to get a homogenous transfection of all CFs in a given region. To be able to judge the significance of the results, the readers should be provided with material allowing assessing the transfection quality. The images shown in panels Bi-ii are spatially restricted and of too low quality to make judgements. Also, it is not stated whether the images shown are from GFP or NpHR-transfected animals. These different payloads are delivered using different viral capsids (AAV1 vs. AAV9) that have significantly different transfection capacities and results from AAV9-CamKIIGFP cannot be generalized to AAV1-CamKII-NpHR. Please show the expression for the capsid used with NpHR.

      To clarify, the images in Figure Bi-ii are representative of GFP expression in animals transfected using AAV1-CamKII-EGFP. The purpose of these panels is to confirm the successful targeting of the region of interest rather than to evaluate viral tropism or capsid-specific transfection efficiency. Moreover, while the transfection characteristics of AAV1 and AAV9 may differ, the key experimental parameter of effective CF suppression was validated through in-vivo electrophysiological recordings, which robustly confirm the efficacy of NpHR expression.

      (3) Finally, please show the location of the optic fiber implant in the flocculus from post-mortem images.

      In Figure 3a of our revised manuscript, we added post-mortem histological images showing the exact location of the optic fiber implants in the flocculus. These images provided clear confirmation that the optogenetic stimulation was targeted to the correct anatomical region, ensuring that the observed effects are attributable to CF manipulation in the flocculus.

      Reviewer #2 (Recommendations For The Authors):

      (1) The efficacy of CF suppression is questionable. The histology in Figure 1 shows that only a handful of CFs are transduced in their approach. This observation casts doubt on the claimed complete suppression of CF-evoked EPSCs in every recorded PC in the same figure. This necessitates a more detailed explanation for this apparent discrepancy. Also, the absence of current-clamp recordings to measure the effect on CF-evoked complex spiking in PCs and the lack of detail regarding the timing of optogenetic actuation (continuous or pulsed) during these slice experiments are also significant omissions.

      We are providing additional in vivo electrophysiological recordings showing sustained CF suppression in awake animals (Figure 2). These recordings will directly demonstrate the extent of CFevoked complex spike (Cs) suppression.

      Moreover, we have included additional data of current-clamp recordings to measure the impact of CF suppression on Cs activity (Figures 1E and 1F). Regarding the timing of the optogenetic actuation, the stimulation was applied continuously in the slice experiments.

      (2) The authors claim that their method effectively suppresses CF activity in vivo, yet they do not present any supporting data. Given the histological evidence provided, it's questionable whether their approach truly impacts the CF population broadly, casting doubts on the efficacy of their suppression approach to identify the role of CFs during behavior. To address these concerns, further experiments and detailed quantification are essential to validate the extent and uniformity of CF suppression achieved.

      As we responded earlier, we conducted additional in-vivo experiments with continuous recordings of CF-evoked complex spike (Cs) activity during optogenetic suppression (Figure 2). These data directly demonstrate effective and sustained inhibition of CF transmission throughout the behavioral experiments. Quantification of CF suppression revealed consistent inhibition across the manipulation period, with no observable alterations in Purkinje cell simple spike firing rates, confirming that our intervention specifically targeted CF activity without off-target effects. In addition to the in-vivo data, the in-vitro data presented in Figure 1 (lines 107~116) further validate the efficacy of our optogenetic manipulation, showing consistent suppression of CF transmission without any failures. These findings collectively confirm the reliability and specificity of our suppression approach for studying CF contributions to behavior.

      (3) To optogenetically test the role of CFs in memory consolidation, the authors deliver continuous, high-power light to the flocculus (13 mW for 6 hrs). This extends well beyond typical experimental conditions. The sustained nature of the light exposure thus brings into question the consistency and reliability of CF suppression over time. Firstly, it is imperative to determine whether CF activity is suppressed throughout this extended period. Secondly, the intensity and duration of light exposure carry a significant risk of causing extensive damage to the surrounding tissue. Given these concerns, a thorough histological examination is warranted to assess the potential adverse effects on tissue integrity. Such an analysis is crucial not only for validating the experimental outcomes but also for ensuring that the observed effects are not confounded by light-induced tissue damage.

      To address whether CF activity is suppressed throughout the extended period, we included new in-vivo recordings demonstrating robust suppression of CF transmission, as evidenced by inhibited complex spikes sustained at 40 minutes of optostimulation. Regarding potential tissue damage, our optogenetic protocol used a light intensity (13 mW), which is much lower than the 75 mW threshold reported by Cardin et al. (2010) as sufficient to maintain normal neuronal activity. Moreover, critical damage typically requires intensities exceeding 100 mW for several hours (Cardin, Jessica A., et al. "Targeted optogenetic stimulation and recording of neurons in vivo using cell-type-specific expression of Channelrhodopsin-2." Nature protocols 5.2 (2010): 247-254.). Finally, we observed no abnormalities in long-term memory consolidation or gain retrieval (Figures 3C, 4C, 4F), further supporting that our light stimulation did not induce tissue damage.

      (4) The generalizability of their findings to various learning behaviors remains uncertain. Given that the flocculus plays a role in vestibulo-ocular reflex (VOR) adaptation, which encompasses both CFdependent and CF-independent learning types (gain increase and gain decrease, respectively), this system could offer a more feasible approach for investigating hypotheses about the role of CFs in guiding distinct learning processes.

      In response to the reviewer’s comment on the generalizability of our findings to learning behaviors involving both CF-dependent and CF-independent mechanisms, we acknowledge the importance of examining these dynamics in cerebellar motor adaptation systems, such as the OKR. Although our study used an OKR task, findings from VOR studies apply here. Ke et al. (2009) demonstrated that VOR gain increases (CF-dependent) and gain decreases (CF-independent) involve distinct plasticity processes (Ke, Michael C., Cong C. Guo, and Jennifer L. Raymond. "Elimination of climbing fiber instructive signals during motor learning." Nature neuroscience 12.9 (2009): 1171-1179), suggesting that CF engagement is task-dependent, particularly for larger error signals that require CF-guided adaptation.

      Similarly, our OKR findings suggest that CF-dependent pathways are likely used for large, persistent errors, whereas CF-independent mechanisms may drive more gradual adjustments. This alignment between OKR and VOR systems supports the generalizability of CF-selective adaptation across cerebellar learning tasks. We have elaborated on this point in our revised manuscript (lines 219~237), clarifying how CF-dependent and CF-independent mechanisms can generalize across motor learning contexts in the cerebellum.

      (5) The acute effect of CF suppression on OKR eye movements warrants investigation. If OKR eye movements are altered by their method, this could complicate the interpretation of their results.

      During our experiments, we monitored ocular movements during CF optogenetic manipulation and found no aberrant effects, such as nystagmus. As shown in Figures 4G and 4H, disrupting CF signaling during gain retrieval did not alter the gain, confirming that our manipulation neither acutely affects ocular reflexes nor induces abnormal eye movement. Therefore, it leads to the conclusion that the observed effects are specific to learning and memory processes.

      (6) The authors raise the potential issue of inducing presynaptic LTD in CFs. Can they be sure that their manipulation doesn't generate a similar effect? Additional controls or techniques to accurately interpret the results are needed considering this concern.

      However, our discussion does not claim that optogenetic suppression directly induces CF-LTD. Instead, we posit that CF suppression may have mimicked the functional consequences of CFLTD, such as reduced complex spike (Cs) activity and associated calcium signaling. This, in turn, may have indirectly interfered with the induction of parallel fiber-Purkinje cell (PF-PC) LTD, thereby preventing gain enhancement during learning.

      This hypothesis is consistent with previous studies highlighting the interplay between CF and PF synaptic plasticity in cerebellar motor learning. For example, Hansel and Linden (2000) and Weber et al. (2003) discuss how changes at CF synapses can modulate Cs waveforms and calcium dynamics, which are critical for PF-PC LTD. Coesmans et al. (2004) and Han et al. (2007) further elaborate on the necessity of CF input for effective PF-PC LTD induction during learning tasks such as retinal slip correction.

      While our experiments were not designed to directly measure CF-LTD, the observed prevention of gain enhancement aligns with the hypothesis that CF suppression functionally disrupted downstream PF-PC LTD. We have clarified these points in our revised manuscript (lines 250~258) to avoid misunderstanding.

      (7) The specific timeframe for OKR consolidation remains uncertain, with evidence from numerous studies indicating that cerebellar memory consolidation unfolds over several days. Therefore, a more thorough investigation into these extended durations, supported by control experiments to validate the outcomes, would significantly strengthen the study's conclusions, and provide clearer insights into the consolidation process of OKR learning.

      Our current study specifically focused on the early phase of the post-learning period, as supported by findings from several studies: Cooke et al., (2004); Titley et al., (2007); Steinmetz et al., (2016); Seo et al., (2024)

      These studies collectively indicate that cerebellar-dependent memory consolidation—including OKR—can occur rapidly during the early consolidation phase. While the specific mechanisms examined in these studies vary (e.g., synaptic plasticity, intrinsic plasticity, or circuit-level changes), they consistently demonstrate that modifications in the cerebellum after the early consolidation period no longer influence memory storage or performance. This evidence strongly supports the relevance of our experimental focus and the timing of our interventions.

      We acknowledge the importance of investigating extended consolidation periods, which could indeed provide additional insights. However, given our current aims, the rapid consolidation dynamics observed in the early phase are most relevant to the questions addressed in this study. We have elaborated on these matter in our revised manuscript (lines 273~283).

      (8) Issues around whether the authors have control over CF activity with their optogenetic intervention raise questions of whether learning can be recovered during the training procedure if the optogenetic stimuli are halted. Specifically, if suppression is applied for three blocks (what the authors refer to as "sessions") during the training procedure and then ceases, does learning rapidly recover in the immediately following blocks?

      While we did not directly examine the restoration of learning capability within the same training session following the cessation of optogenetic inhibition, we believe several aspects of our experimental design and insights from prior studies support our interpretation.

      Our optogenetic intervention specifically targeted Purkinje cells (PCs) in the flocculus and was applied continuously during designated training sessions to modulate cerebellar activity. Notably, Medina et al. (2001) demonstrated that transient inactivation of the cerebellar cortex impairs the expression of learned responses but does not disrupt the underlying plasticity mechanisms (Medina, Javier F., Keith S. Garcia, and Michael D. Mauk. "A mechanism for savings in the cerebellum." Journal of Neuroscience 21.11 (2001): 4081-4089.). This finding suggests that cerebellar plasticity remains intact and functional even after transient perturbations.

      Therefore, it is plausible that once optogenetic inhibition is lifted, the cerebellar network regains its capacity for learning and adaptation, as the intrinsic plasticity and memory encoding processes remain preserved. While we acknowledge that direct experimental confirmation of rapid recovery in our setup was not performed, this interpretation is consistent with our experimental framework and the broader literature.

      (9) The study does not fully explore the instructive signals/mechanisms underlying the memory consolidation process. A detailed investigation into potential instructive signals for consolidation beyond CF-induced signaling, like the simple spiking of PCs, could significantly enhance the study's conclusions. Indeed, there is currently no evidence to suggest that CFs play a role in the consolidation phase anyway so testing their role seems a bit of a strawman argument.

      While our study primarily focused on characterizing CF-dependent pathways, we acknowledge that memory consolidation is likely driven by a multifaceted interplay of instructive signals beyond CF-induced mechanisms. In particular, Purkinje cell (PC) simple spiking may act as a critical signal during the consolidation phase, either complementing or functioning independently of CF input. Emerging evidence suggests that simple spiking can modulate downstream circuitry in ways that stabilize and strengthen memory traces.

      To address this, we have expanded the discussion in the revised manuscript to explore potential instructive signals for consolidation, including PC simple spiking, local circuit plasticity within the cerebellar cortex, and its interaction with the cerebellar nuclei. We propose that these mechanisms collectively contribute to the transfer and stabilization of motor memory, offering a more comprehensive framework for understanding consolidation. We have elaborated on these matter in our revised manuscript (lines 238~250).

      (10) Previous reports have highlighted the necessity of CF activity for extinction/memory maintenance (Medina et al. 2002; Kim et al. 2020). That is, the absence of CF activity is consequential for cerebellar function. These results present a potential contrast to the findings reported in this current study. This discrepancy raises important questions about the experimental conditions, methodologies, and interpretations of CF function across different studies. A thorough discussion comparing these divergent outcomes is essential, as it could elucidate the specific contexts or conditions under which CF activity influences memory processes.

      We acknowledge that previous studies (Medina et al., 2002; Kim et al., 2020) have suggested a role for climbing fiber (CF) activity in extinction. However, our study specifically focuses on the acquisition phase of motor learning and does not extend to extinction or maintenance. As such, we have revised our discussion to limit interpretations strictly to the scope of our findings and removed references to extinction.

      The discrepancies between our results and prior work may arise from differences in methodologies and behavioral paradigms. For instance, we utilized optogenetic inhibition to achieve precise temporal and spatial control of CF activity, whereas previous studies employed pharmacological or lesion methods that may have broader effects on the cerebellar circuitry. Additionally, differences in behavioral paradigms, such as the optokinetic reflex (OKR) task used in our study compared to the eye-blink conditioning tasks in prior studies, may demand distinct roles for CF signaling depending on the specific requirements for error correction and adaptation.

      This clarification is now incorporated into our revised manuscript, and the discussion has been streamlined to focus on the phase-specific role of CF activity during acquisition without extending to extinction or maintenance (lines 259~270).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      The article emphasizes vocal social behavior but none of the experiments involve a social element. Marmosets are recorded in isolation which could be sufficient for examining the development of vocal behavior in that particular context. However, the early-life maturation of vocal behavior is strongly influenced by social interactions with conspecifics. For example, the transition of cries and subharmonic phees which are high-entropy calls to more low-entropy mature phees is affected by social reinforcement from the parents. And this effect extends cross context where differences in these interaction patterns extend to vocal behavior when the marmosets are alone. From the chord diagrams, cries still consist of a significant proportion of call types in lesioned animals. Additionally, though it is an intriguing finding that the infants' phee calls have acoustic differences being 'blunted of variation, less diverse and more regular,' the suggestion that the social message conveyed by these infants was 'deficient, limited, and/or indiscriminate' is not but can be tested with, for example, playback experiments.

      We recognize that our definition of vocal social behavior is not within the normal realm of direct social interactions. We were particularly interested in marmoset vocalizations as a social signal, such as phees, cries and twitter, even when their family members or conspecifics are not visibly present. Generally speaking, in the laboratory, infant marmosets make few calls when in the presence of another conspecific, but when isolated they naturally make phee calls to reach out to their distantly located relatives. In this context, while we did not assess the animals interacting directly, we assessed what are normally referred to as ‘social contact calls,’ hence the term ‘social vocalizations.’ Playback recordings might provide potential evidence of antiphonal calling as a means of social interaction and might reveal the poor quality of the social message conveyed by the infant, but even here, the vocalizing marmoset would be calling to a non-visible conspecific. Thus, although our experiment lacked a direct social element, our data suggest that in the absence of a functioning ACC in early life, infant calls that convey social information, and which would elicit feedback from parents and other family members, may be compromised, and this could potentially influence how that infant develops its social interactive skills. We have now commented on the significance of social vocalizations in the introductory text (page 3) and discussion (page 15).

      The manuscript would benefit from the addition of more details to be able to better determine if the conclusions are well supported by the data. Understanding that this is very difficult data to get, the number of marmosets and some variability in the collection of the data would allow for the plotting of each individual across figures. For example, in the behavioral figures, which is the marmoset that is in the behavioral data that has a sparing of the ACC lesion in one hemisphere? Certain figures, described below in the recommendations for the authors, could also do with additional description.

      Thanks for these suggestions. We have plotted the individual animals in the relevant figures and addressed the comments and recommendations listed below.

      Reviewer #1 (Recommendations For The Authors):

      Given the number of marmosets, variability in the collected data, lesion extent, and different controls, I would like to see more plots with individuals indicated (perhaps with different symbols). More details could also be added for several plots.

      Figure 2D (new) and 2E now have plots that represent the individual animals, each represented by a different symbol.

      Figure 2A) Since lesions are bilateral, could you also show the extent of the lesions on the other side for completeness?

      Our intention was to process one hemisphere of each brain for Golgi staining to examine changes in cell morphology in the ACC and associated brain regions following the lesion. Unfortunately, the Golgi stain was unsuccessful. Consequently, we were unable to use the tissue to reconstruct the bilateral extent of the lesion. We did, however, first establish the bilateral nature of the lesion through coronal slices of the animals MRI scan before processing the intact hemisphere to confirm the bilateral extent of the lesion. The MRI scans (every 5th section) for each control and lesioned animal is compiled in a figure in the supplementary materials (Fig. S1). These scans show that the ACC-lesioned animals have bilateral lesions with one animal (ACC1) showing some sparing in one hemisphere, as we noted in the text. We have now made reference to this supplemental figure in the text (page 5).

      Figure 2B/C) In Figure 2B, control and ACC lesions are in the columns while right next to it in 2C, ACC lesion and control are in the rows. Could these figures be adjusted so that they are consistent?

      We have now adjusted these figures and updated the figure legends accordingly.

      Figure 2C) Is there quantification of the 'loss of neurons and respective increase in glial cells at the lesioned site especially at the interface between gray and white matter'? There are multiple slices for each animal.

      Thanks for suggesting this. We have now quantified these data which are presented as a new graph as Fig. 2D. These data revealed a significant loss of neurons (NeuN) in the ACC group as well as an increase in glial cells (GFAP and Iba1) relative to the controls. The figure legend and results have also been updated.

      Figure 2C) It is difficult for me to distinguish between white and purple - could you show color channels independently since images were split into separate channels for each fluorophore?

      Fig. 2C has been revised to better visualize the neurons and glia at the gray and white matter interface. We found that grayscale images for each channel offered a better contrast than separating the channels for each fluorophore.

      Figure 2C/D) I like how there are individual dots here for the individual marmosets. Since there are four in each group, could they be represented throughout with symbols (with a key indicating the pair and also the control condition)? For example, were there changes in the histology for control animals that got saline injections as opposed to those that didn't get any surgery?

      We have highlighted the individual animals with different symbols in the figures. Although some animals were twin pairs, it was not possible to have twins in all cases. Only two sets were twins. We have indicated the symbols that represent the twin pair in Fig. 2 as well as the MRI scans of the twin pairs in Fig. S1. There were no observed changes in histology for the sham animals relative to the other non-sham controls. The MRI scan for one sham CON2 shows herniated tissue in the right hemisphere which is a normal consequence of brain exposure caused by a craniotomy.

      Figure 3D-E) Here, individual data points could be informative especially given that some animals are missing data past the third week.

      To prevent cluttering the figure with too many data points, we have added the sample size for each group in the figure legend (pages 33).

      Figure 3D/F) What exactly is the period that goes into this analysis? In the text, 'Further analysis showed that the ACC lesion had minimal effects on the rate of most call types during this period'. Is this period from weeks 3 to 6 relative to the proportions in week 2? I think I also don't quite understand the chord diagram. The legend says 'the numbers around each chord diagram represents relative probability value for each call type transition' so how does that relate to the proportion of these call types? It looks like there is a wider slice for cries for ACC-lesioned animals each week. I also don't see in the week 4 chord diagram, the text description of 'elevation in the rate of 'other' calls, which comprised tsik, egg, eck, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion."

      We apologize for the confusion. Fig 3D and Fig 3F are not directly related. Fig. 3D shows the different types of emitted calls. The figure shows the averaged data per group pooled from post-surgery weeks (week 3 – week 6). It represents the proportion of individual call types relative to the total number of calls during each recording period. The only major finding here was the increased rate of ‘other’ calls comprising tsik, egg, ock, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion.

      While Fig. 3D represents the differences in the proportion of calls, the chord diagrams in Fig. 3F represents the probability of call-to-call transition obtained from a probability matrix. At postnatal week 6, marmosets with ACC lesions showed a higher likelihood of transitions between all call types, but less frequent transitions between social contact calls relative to sham controls. The chord diagrams visualize the weighted probabilities and directionality of these transitions between the different call types. Weighted probabilities were used to account for variations in call counts. The thickness of the arrows or links indicates the probability of a call transition, while the numbers surrounding each chord diagram represent the relative probability value for each specific transition. We have now reworded the text and clarified these details in the figure legend (pages 32-33).

      Figure 3E) How is the ratio on the y-axis calculated here?

      The y-axis represents the averaged value of the ratios of the number of social contact calls relative to non-social contact calls in each recording per subject per group (i.e., (x̄ (# social calls / # non-social calls). This is now included in the figure legend and the axis is updated (page 32).

      Also, cries could be considered a 'social contact call' since they are produced by infants to elicit responses from the parents. There is also the hypothesis in the literature that cries transition into phees.

      The reviewer is correct. Cries are often considered a social contact call because they elicit parental feedback. We decided to separate cry-calls from other social contact calls for two reasons. First, in our sample, we found cry behavior to be highly variable across the animals. For example, one control infant cried incessantly whereas another control infant cried less than normal. This extreme variability in animals of the same group masked the features between animals that reliably differentiated between them. Second, cry-calls elicit feedback from parents who are normally within the vicinity of the infant whereas phee calls elicit antiphonal phee calls from any distantly located conspecific. In other words, the context in which these calls are often elicited are very different.

      The use of 'syntactical' is a bit jarring to me because outside of linguistics, its use in animal communication generally refers to meaning-bearing units that can be combined into well-formed complexes such as pod-specific whale songs or predator alarm calls with concatenated syllable types in some species of monkeys. To my knowledge, individual phee syllables have not been currently shown to carry information on their own and may be better described as 'sequential' rather than 'syntactical'.

      We agree. We have made this change accordingly.

      Figure 4B) How many phee calls with differing numbers of syllables are present each week? How equal is the distribution given that later analyses go up to 5 syllables?

      The total number of phee calls with differing number of syllables ranged between 20-40 phees. This number varied between subjects, per week. The most common were 3- and 4-syllable phee calls which ranged from 7-15. Due to this variability, Fig. 4B presents the average syllable count. The axis is now updated.

      Figure 4C-E) How is the data combined here? Is there a 2nd syllable, the combined data from the 2nd syllable from phee calls of all lengths (1 - 5?). If so, are there differences based on how long the total sequence is?

      The combined data represents the specific syllable (e.g., the 1st syllable in a 2-syllable phee, in a 3-syllable phee and in a 4-syllable phee) irrespective of the length of the sequence in a sequence. No differences were observed between 2nd syllable in a 2 syllable phee and 2nd syllable in a 3 or a 4 syllable phee. We have included this detail in the figure legend (page 33-34).

      So duration is a vocal parameter that is highly dependent on physical factors such as body size and lung volume, where there differences in physical growth between the pairs of ACC-lesioned marmosets and their twins? Entropy is less closely tied to these physical factors but has previously been shown to decrease as phee calls mature, which we can also see in the negative relationship of the control animals. Do you know of experiments that show that lower entropy calls are more 'blunted'?

      Thank you for raising the important issue of physical growth factors. For twin pairs, it is not uncommon for one infant to be slightly bigger, heavier or stronger than the other presumably because one gets more access to food. With increasing age, we did not observe significant changes in bodyweight between the groups. We examined grip strength in all infants as a means of assessing how well the infant was able to access food during nursing. Poor grip strength would indicate a lower propensity to ‘hang on’ to the mother for nursing which could lead to lower weight gain and reduced physical growth. We found that both grip strength and body weight increased as the infants got older and both parameters were equivalent. We have included an additional figure to show the normal increase in both weight and grip strength to the supplemental materials (Fig. S3) and have made reference to this in the text (page 8).

      As for entropy, it’s impact on the emotional quality of vocalizations has not been systematically explored. Generally speaking, high entropy relates to high randomness and distortion in the signal. Accordingly, one view posits low-entropy phee calls represent mature sounding calls relative to noisy and immature high-entropy calls (e.g., Takahasi et al 2017). In the current study, the reduction in syllable entropy observed for both groups of animals with increasing age is consistent with this view. At the same time entropy can relate to vocal complexity; high entropy refers to complex and variable sound patterns whereas low entropy sounds are predictable, less diverse and simple vocal sequences (Kershenbaum, A. 2013. Entropy rate as a measure of animal vocal complexity. Bioacoustics, 23(3), 195–208). One possibility is that call maturity does not equate directly to emotional quality. In other words, a low-entropy mature call can also be lacking in emotion as observed in humans with ACC damage; these patients show mature speech, but they lack the variations in rhythms, patterns and intonation (i.e., prosody) that would normally convey emotional salience and meaning. Our observation of a reduction in phee syllable entropy in the ACC group in the context of being short and loud with reduced peak frequency is consistent with this view. Our use of the word ‘blunt’ was to convey how the calls exhibited by the ACC group were potentially lacking emotional meaning. Beyond this speculation, we are not aware of any papers that have examined the relationship between entropy and blunted calls directly. We have now included this speculation in the discussion (pages 12-13).

      Reviewer #2 (Public Review):

      The authors state that the integrity of white matter tracts at the injection site was impacted but do not show data.

      We have added representative micrographs of a control and ACC-lesioned animal in a new supplementary figure which shows the neurotoxin impacted the integrity of white matter tracts local to the site of the lesion (Fig. S2).

      The study only provides data up to the 6th week after birth. Given the plasticity of the cortex, it would be interesting to see if these impairments in vocal behavior persist throughout adulthood or if the lesioned marmosets will recover their social-vocal behavior compared to the control animals.

      We agree. Our original intention was to examine behavior into adulthood. Unfortunately, the COVID-19 pandemic compromised the continuation of the study. We were limited by the data that we were allowed to acquire due to imposed restrictions. Some non-vocalization data collected when the animals were young adults is currently being prepared for another paper.

      Even though this study focuses entirely on the development of social vocalizations, providing data about altered social non-vocal behaviors that accompany ACC lesions is missing. This data can provide further insights and generate new hypotheses about the exact role of ACC in social vocal development. For example, do these marmosets behave differently towards their conspecifics or family members and vice versa, and is this an alternate cause for the observed changes in social-vocal development?

      We agree. At the time however, apparatus for assessing behavior between the infant’s family and non-family members was not available. Assessing such behaviors in the animals holding room posed some difficulty since marmosets are easily distracted by other animals as well as the presence of an experimenter, amongst other things. This is an area of investigation we are currently pursuing.

      Reviewer #3 (Public Review):

      It is striking to find that the vocal repertoire of infant marmosets was not significantly affected by ACC lesions. During development, the neural circuits are still maturing and the role of different brain regions may evolve over time. While the ACC likely contributes to vocalizations across the lifespan, its relative importance may vary depending on the developmental stage. In neonates, vocalizations may be more reflexive or driven by physiological needs. At this stage, the ACC may play a role in basic socioemotional regulation but may not be as critical for vocal production. Since the animals lived for two years, further analysis might be helpful to elucidate the precise role of ACC in the vocal behavior of marmosets.

      Figure 3D. According to the Introduction "...infant ACC lesions abolish the characteristic cries that infants normally issue when separated from its mother". Are the present results in marmosets showing the opposite effect? Please discuss.

      To date, the work of Maclean (1985) is the only publication that describes the effect of early cingulate ablation on the spontaneous production of ‘separation calls’ largely construed as cries, coos and whimpers in response to maternal separation. All of this work was largely performed in rhesus macaques or squirrel monkeys. In addition to ablating the cingulate cortex, Maclean found that it was necessary to ablate the subcallosal (areas 25) and preseptal cingulate cortex (presumably referring to prelimbic area 32) to permanently eliminate the spontaneous production of separation cry calls. Our ablation of the ACC was more circumscribed to area 24 and is therefore consistent with MacLean’s earlier work that removal of ACC alone does not eliminate cry behavior. In adults, ACC ablation is insufficient at eliminating vocalization as well. We make reference to this on pages 13-14 of the discussion.

      Figure 3E and Discussion. Phees are mature contact calls and cries immature contact calls (Zhang et al, 2019, Nat Commun). Therefore, I would rather say that the proportion of immature (cries) contact calls increases vs the mature (phee, trill, twitters) contact calls in the ACC group. Cries are also "isolated-induced contact calls" to attract the attention of the caregivers.

      The reviewer is correct in that cries are directed towards caregivers but in our sample, cry behavior was highly variable between the infants. Consequently, in Fig. 3E social contact calls include phee, twitter and trill calls but does not include cries which were separated (see also response to reviewer #1). Many of the calls made during babbling were immature in their spectral pattern (compare phee calls between Fig. 3A and 3B). Cries typically transitioned into phees, twitters or trills before they fully matured. Fig 3E shows that the controls made more isolation-induced social contact calls at postnatal week 6 which were presumably maturing at this time point. Thus, if anything, there was an increase in the proportion of mature contact calls vs immature contact calls with increasing age.

      Figure 4D. Animal location and head direction within the recording incubator can have significant effects on the perceived amplitude of a call. Were these factors taken into account?

      The reviewer makes an excellent observation. Unfortunately, we did not account for location and head direction because the infants were quite mobile in the incubator. The directional microphone was hidden from view because the infants were distracted by it, and positioned ~12 cm from the marmoset, and placed in the exact same location for every recording. In addition, calls with phantom frequencies were eliminated during visual inspection of spectrograms. Beyond these details, location and head direction were not taken into account.

      Figure 4E. When a phee call has a higher amplitude, as is the case for the ACC group (Figure 4D), the energy of the signal will be concentrated more strongly at the phee call frequency ~8KHz. This concentration of the energy reduces the variability in the frequency distribution, leading to lower entropy. The interpretation of the results should be reconsidered. A faint call (control group) can exhibit more variability in the frequency content since the energy is distributed across a wider range of frequencies contributing to higher entropy. It can still be "fixed, regular, and stereotyped" if the behavior is consistent or predictable with little variation. Also, to define ACC calls as "monotonic" I would rather search for the lack of frequency modulation, amplitude variation, or narrower bandwidth.

      We very much appreciate this explanation. We were able to identify the maximum frequency that closely matched pitch of a sound for each syllable in a multisyllabic phee. New Fig. 4E shows that the peak frequency for each phee syllable was lower in the ACC-lesioned monkeys which may directly translate to the low entropy observed in this group. The term “monotonic” was used to relate our data to the classical and long-standing evidence of human ACC lesions causing monotonous intonation of speech. When all factors are taken into account, it is evident that the vocal phee signature of the ACC-lesioned animal was structurally different to the controls implicating a less complex and stereotyped ACC signal. Further studies are needed to systematically explore the relationship between entropy and emotional quality of vocalizations

      Apart from the changes in the vocal behavior, did the AAC lesions manifest in any other observable cognitive, emotional, or social behavior? ACC plays a role in processing pain and modulating pain perception. Could that be the reason for the observed increase in the proportion of cries in the ACC group and the increase in the phee call amplitude? Did the cries in the ACC group also display a higher amplitude than the cries in the control group?

      It was our intention to acquire as much data as possible from these infants as they matured from a cognitive, social and emotional perspective. Unfortunately, our study was hampered by variety of reasons including the COVID-19 pandemic which imposed major restrictions on our ability to continue with the experiment in a time sensitive manner. In addition, the development and construction of the custom apparatus to measure these behaviors was stalled during this period further preventing us from collecting behavioral data at regular time intervals. As for the cry behavior, the number of cries, in the ACC group were very low especially at postnatal week 5 and 6. Consequently, there were very few data points to work with.

      Discussion. Louder calls have the potential to travel longer distances compared to fainter calls, possess higher energy levels, and can propagate through the environment more effectively. If the ACC group produced louder phee syllables, how could be the message conveyed over long distances "deficient, limited, and/or indiscriminate"?

      Thanks for raising this interesting concept. Not all calls emitted by the animals were loud. We specifically examined the long-distance phee call in this regard. The phee syllables emitted by the ACC group were high amplitude with low frequencies, short duration and low entropy. Taking these factors into account, it is conceivable that the phee calls produced by the ACC group could not effectively convey their message over long distances despite their propagation through the environment. We have made reference to this in the discussion where we focus is specifically on the phee calls only (pages 12).

      Abstract: Do marmosets have syntax? Consider replacing "syntactical" with a more appropriate term (maybe "syntax-like").

      Thanks for this suggestion. We have replaced the term syntactical with ‘sequential’ as per the recommendation of reviewer #1.

      Introduction: "...cries that infants normally issue when separated from its mother". Please replace "its" with "their".

      This has been corrected.

      Results: Is the reference to Fig 1B related to the text?

      We have included and referred to Fig. 1B in the text (results and methods) to show other researchers how they can use this technique as a reliable and safe means of monitoring tidal volume under anesthesia in small infant marmoset without intubation.

      I understand that both "spectrograph" and "spectrogram" are used to analyze the frequency content of a signal. Nevertheless, "spectrogram" refers to the visual representation of the frequency content of a signal over time, and this term is commonly used in audio signal processing and specifically in the vocal communication field. I would recommend replacing "spectrograph" with "spectrogram".

      Thanks for this suggestion. We have corrected this throughout the manuscript.

      (Concerning the previous comment in the public review). Cries are uttered to attract the attention of the caregivers. The increase in the proportion of cries in the ACC group does not match the sentence: "...these infants appeared to make little effort in using vocalizations to solicit social contact when socially isolated".

      We apologize for the confusion. It is not the case that the ACC animals make more cries. Cry calls were highly variable amongst the animals. Consequently, although Fig 3D gives the impression that the proportion of cries in higher in ACC animals they did not differ significantly from the controls. Due to their high variability, cries were removed in the measurement of social contact. Accordingly, Fig. 3E does not include cry behavior; it shows that the ACC animals engage less in social contact calls.

      Related to Figure 3. What is the difference between "egg" and "eck" calls? Do you mean "ock"?

      We apologize. This was a typo. It should be ock calls.

      Figure 4B. Is the sample size five animals per group and per week? Overlapping data points seem to be placed next to each other. Why in some groups (e.g. ACC 6 weeks) less than five dots are visible?

      The sample size differed per week because of the lack of recording during the COVID restrictions. In Fig 4b, we have now separated the overlapping dots. We have also added the sample size of the groups in the figure legends.

      Would the authors expect to see stronger differences between the lesioned and the control groups when comparing a later developmental stage? The animals were euthanized at the age of

      These speculation is certainly feasible and yes, we were hoping to establish this level of detail with testing at later developmental stages. This is an aspect of development we are currently pursuing.

      Could these experiments be conducted?

      I’m afraid these animals are longer available, but we are currently conducting experiments in other animals with early life neurochemical manipulations who show behavioral changes into early adulthood.

      ACC lesion: It is reported that the lesions extended past 24b into motor area 6M. Did the animal display any motor control disability?

      Surprisingly, despite the lesion encroaching into 6M, these animals showed no observable motor impairment. We assessed the animals grip strength and body weight and discovered normal strength and growth in weight in both controls and the lesioned group. We have added this data as supplemental information (Fig. S3).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different.

      Any task design involves tradeoffs; if the visual stimulus was behaviorally relevant, then any observed neurophysiological changes would be more confounded by possible attentional effects. We cannot exclude the possibility that a different task or different stimuli would produce different results; we ourselves have reported firing rate enhancements for other types of visual probes during an MGS task (Merrikhi et al. 2017). We have added an acknowledgement of these limitations in the discussion section (lines 323-330 in untracked version). At minimum, our results show a dissociation between the top-down modulation of phase coding, which is enhanced during WM even for these task-irrelevant stimuli, and rate coding. Establishing whether and how this phase coding is related to perception and behavior will be an important direction for future work.

      With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

      We have added a paragraph to the discussion section addressing links to attention and motor planning (lines 315-333), and specifically acknowledging the inherent difficulties of fully dissociating these effects when interpreting our results (lines 323-330).

      Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      Per the reviewer’s suggestion, we have added several new supplementary figures. We now show the F-statistic for discriminability over time for the LFP timecourse (Fig. S2), and as a function of power in various frequencies (Fig. S4). We have added before/after inactivation comparisons of the LFP and spiking activity, and their respective F-statistics for discrimination between contrasts and orientations in Fig. S9. Lastly, we added a supplementary figure evaluating the impact of FEF inactivation on beta phase coding in the OUT condition, showing no significant change (Fig. S11).

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      We have updated the STA slope measurement, excluding the low contrast condition which lacks a clear peak in the STA. Additionally, we equalized the bin widths and aligned the x-axes for better visual comparability. Then, we performed a two-way ANOVA, analyzing the effects of spatial features (IN vs. OUT) and visual conditions (contrast and orientation). The results showed a significant effect of the visual feature on both orientation (F = 3.96, p=0.046) and contrast (F = 14.26, p<10<sup>-3</sup>). However, neither the spatial feature nor the spatial-visual interaction exhibited significant effects for orientation (F = 0.52, p=0.473, F=1.56, p=0.212) or contrast (F = 2.19, p=0.139, F=1.15, p=0.283).

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) The authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      We do not believe this to be likely based on what is known anatomically and functionally about this circuitry. Anatomically, the projections from FEF to V4 arise primarily from the supragranular layers, not layers which contain the highest proportion of motor activity (Barone et al. 2000, Pouget et al. 2009, Markov et al. 2013). Functionally, based on electrical identification of V4-projecting FEF neurons, we know that FEF to V4 projections are predominantly characterized by delay rather than motor activity (Merrikhi et al. 2017). We have now tried to emphasize these points when we introduce the inactivation experiments (lines 185-186).

      Experimentally, the spread of the pharmacological effect with our infusion system is quite large relative to any clustering of visual vs. motor neurons within the FEF, with behavioral consequences of inactivation spreading to cover a substantial portion of the visual hemifield (e.g., Noudoost et al. 2014, Clark et al. 2014), and so our manipulation lacks the spatial resolution to selectively target motor vs. other FEF neurons.

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      Unfortunately, our chamber placement for V4 has produced linear array penetration angles which do not reliably allow identification of cortical layers. We are aware of previous results showing layer-specific effects of attention in V4 (e.g., Pettine et al. 2019, Buffalo et al. 2011), and it would indeed be interesting to determine whether our observed WM-driven changes follow similar patterns. We may be able to analyze a subset of the data with current source density analysis to look for layer-specific effects in the future, but are not able to provide any information at this time.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      While the shift in peak frequency across contrasts is more prominent than that across orientations (Fig. S3A-B), the relationship between orientation and peak frequency is also significant (one-way ANOVA for peak frequency across contrasts, F<sub>Contrast</sub>=10.72, p<10<sup>-4</sup>; or across orientations, F<sub>Orientation</sub>=3, p=0.030; stats have been added to Fig. S3 caption). This finding also aligns with previous studies, which reported slight peak frequency shifts (~1–2 Hz) in the context of attention (Fries, 2015). To address the question of whether the frequency-firing rate correlation generalizes to orientation-driven changes, we now examine the relationship between peak frequency and firing rate separately for each contrast level (Fig. S14). The average normalized response as a function of peak frequency, pooled across subsamples of trials from each of 145 V4 neurons (100 subsamples/neuron), IN vs. OUT conditions, shows a significant correlation during the delay period for each contrast (contrast low (F<sub>Condition</sub>=0.03, p=0.867; F<sub>Frequency</sub>=141.86, p<10<sup>-18</sup>; F<sub>Interaction</sub>=10.70, p=0.002, ANCOVA), contrast middle (F<sub>Condition</sub>=7.18, p=0.009; F<sub>Frequency</sub>=96.76, p<10<sup>-14</sup>; F<sub>Interaction</sub>=0.13, p=0.716, ANCOVA), contrast high (F<sub>Condition</sub>=12.51, p=0.001; F<sub>Frequency</sub>=333.74, p<10<sup>-29</sup>; F<sub>Interaction</sub>=7.91, p=0.006, ANCOVA).

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      We have added a new discussion paragraph addressing the relationship to attention and motor planning (lines 315-333). We have also moderated the language used to describe our conclusions throughout the manuscript in light of this ambiguity.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      Looking at the SNR as a ratio of power in the beta band to all other bands, there is no significant drop in SNR between conditions (SNRIN = 4.074+-984, SNROUT = 4.333+-0.834 OUT, p=0.341, Wilcoxon signed-rank). Therefore, we do not think that the change in phase coding is merely a result of less reliable phase estimates.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      This paper only includes inactivation data. We are working on analyzing the simultaneous recording data for a future publication.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

      As the reviewer notes, it is difficult to determine whether the frequency changes drive the rate changes, vice versa, or whether both are generated in parallel by a common source. We have adjusted our language to reflect this (lines 291-293). Future modeling work may be able to shed more light on the causal relationships between various neural signatures.

      Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL when remembering the V4 RF location) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, the two points I raised in the public review merit a bit of development in the Discussion. In addition, the authors should revise some of their conclusions.

      For instance (L217):

      "The finding that WM mainly modulates phase coded information within extrastriate areas fundamentally shifts our understanding of how the top-down influence of prefrontal cortex shapes the neural representation, suggesting that inducing oscillations is the main way WM recruits sensory areas."

      In my opinion, this one is over-the-top on various counts.

      Here is another exaggerated instance (L298):

      "...leading us to conclude that representations based on the average firing rate of neurons are not the primary way that top-down signals enhance sensory processing."

      Again, as noted above, the problem is that one could make the case that the top-down signals are, in fact, highly effective, since they are completely quashing any distracter-related modulation in firing rate across RFs. There is only so much that one can conclude from responses to stimuli that are task-irrelevant, uniform across space, and constant over the course of a trial.

      I think even the title goes too far. What the work shows, by all accounts, is that the sustained activity in FEF has a definitive impact on V4 *even* with respect to a sustained, irrelevant background stimulus. The result is very robust in this sense. However, this is quite different from saying that the *primary* means of functional control for FEF is via phase coding. Establishing that would require ruling out other forms of control (i.e., rate coding) in all or a wide range of experimental conditions. That is far from the restricted set of conditions tested here and is also at variance with many other experiments demonstrating effects of attention or even FEF microstimulation on V4 firing activity.

      To reiterate, in my opinion, the work is carefully executed and the data are interesting and largely unambiguous. I simply take issue with what can be reliably concluded, and how the results fit with the rest of the literature. Revisions along these lines would improve the readability of the paper considerably.

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      Reviewer #3 (Recommendations for the authors):

      (1) My primary comment that came up multiple times as I read the manuscript (and which is summarized above) is that I wasn't ever sure why the authors are focused on analyzing neural coding of task-irrelevant sensory information during a WM task as a function of WM contents (remembered location). Most studies of neural codes supporting WM often focus on coding the remembered information - not other information. Conceptually, it seems that the brain would want to suppress - or at least not enhance - representations of task-irrelevant information when performing a demanding task, especially when there is no search requirement, and when there is no feature correspondence between the remembered and viewed stimuli. (i.e., the interaction between WM and visual input is more obvious for visual search for a remembered target). Why, in theory, would a visual region need to improve its coding of non-remembered information as a function of WM? This isn't meant to detract from the results, which are indeed very interesting and I think quite informative. The authors are correct that this is certainly relevant for sensory recruitment models of WM - there's clear evidence for a role of feedback from PFC to extrastriate cortex - but what role, specifically, each region plays in this task is critical to describe clearly, especially given the task-irrelevance of the input. Put another way: what if the animal was remembering an oriented grating? In that case, MI between spike-based measures and orientation would be directly relevant to questions of neural WM representations, as the remembered feature is itself being modeled. But here, the focus seems to be on incidental coding.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Whether similar phase coding is also used to represent the content of object WM (for example, if the animal was remembering an oriented grating), or whether phase coding is only observed for WM’s modulation of the representation of incoming sensory signals, is an important question to be addressed in future work.

      (2) Related to the above, the phrasing of the second sentence of the Discussion (lines 291-292) is ambiguous - do the authors mean that the FEF sends signals that carry WM content to V4, or that FEF sends projections to V4, and V4 has the WM content? As presently phrased, either of these are reasonable interpretations, yet they're directly opposing one another (the next sentence clarifies, but I imagine the authors want to minimize any confusion).

      We have edited this sentence to read, “Within prefrontal areas, FEF sends direct projections to extrastriate visual areas, and activity in these projections reflects the content of WM.”

      (3) I'm curious about how the authors consider the spatial WM task here different from a cued spatial attention task. Indeed, both require sustained use of a location for further task performance. The section of the Discussion addressing similar results with attention (lines 307-311) presently just summarizes the similarities of results but doesn't offer a theoretical perspective for how/why these different types of tasks would be expected to show similar neural mechanisms.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      (4) As far as I can tell, there is no consideration of behavioral performance on the memory-guided saccade task (RT, precision) across the different stimulus background conditions. This should be reported for completeness, and to determine whether there is an impact of the (likely) task-irrelevant background on task performance. This analysis should also be reported for Figure 3's results characterizing how FEF inactivation disrupts behavior (if background conditions were varied, see point 7 below).

      We have added the effect of inactivation on behavioral RT and % correct across the different stimulus background conditions (Fig. S8). Background contrast and orientation did not impact either RT or % correct.

      (5) Results from Figure 2 (especially Figures 2A-B) concerning phase-locked spiking in V4 should be shown for 0%-contrast trials as well, as these trials better align with 'typical' WM tasks.

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (6) The magnitude of SPL difference in aggregate (Figure 2B) is much, much smaller than that of the example site shown (Figure 2A), such that Figure 2A's neuron doesn't appear to be visible on Figure 2B's scatterplot. Perhaps a more representative sample could be shown? Or, the full range of x/y axes in Figure 2B could be plotted to illustrate the full distribution.

      We have updated Fig. 2A with a more representative sample neuron.

      (7) I'm a bit confused about the FEF inactivation experiments. In the Methods (lines 512-513), the authors mention there was no background stimulus presented during the inactivation experiment, and instead, a typical 8-location MGS task was employed. However, in the results on pg 8 (Lines 201-214), and Figure 3G, the authors quantify a phase code MI. The previous phase code MI analysis was looking at MI between each spike's phase and the background stimulus - but if there's no background, what's used to compute phase code MI? Perhaps what they meant to write was that, in addition to the primary task with a manipulation of background properties, an 8-location MGS task was additionally employed.

      The reviewer is correct that both tasks were used after inactivation (the 8-location task to assess the spread of the behavioral effect of inactivation, and the MGS-background task for measuring MI). We have edited the methods text to clarify.

      (8) How is % Correct defined for the MGS task? (what is the error threshold? Especially for the results described in lines 192-193).

      The % correct is defined as correct completed trials divided by the total number of trials; the target window was a circle with radius of 2 or 4 dva (depending on cue eccentricity). These details have been added to the Methods.

      (9) The paragraph from lines 183-200 describes a number of behavioral results concerning "scatter" and "RT" - the RT shown seems extremely high, and perhaps is normalized. Details of this normalization should be included in the Methods. The "scatter" is listed as dva, but it's not clear how scatter is quantified (std dev of endpoint distribution? Mean absolute error), nor how target eccentricity is incorporated (as scatter is likely higher for greater target eccentricity).

      We have renamed ‘scatter’ to ‘saccade error’ in the text to match the figure, and now provide details in the Methods section. Both RT and saccade error are normalized for each session, details are now provided in the Methods. Since error was normalized for each session before performing population statistics, no other adjustment for eccentricity was made.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The authors propose a new model of biologically realistic reinforcement learning in the direct and indirect pathway spiny projection neurons in the striatum. These pathways are widely considered to provide a neural substrate for reinforcement learning in the brain. However, we do not yet have a full understanding of mechanistic learning rules that would allow successful reinforcement learning like computations in these circuits. The authors outline some key limitations of current models and propose an interesting solution by leveraging learning with efferent inputs of selected actions. They show that the model simulations are able to recapitulate experimental findings about the activity profile in these populations of mice during spontaneous behavior. They also show how their model is able to implement off-policy reinforcement learning.

      Strengths:

      The manuscript has been very clearly written and the results have been presented in a readily digestible manner. The limitations of existing models, that motivate the presented work, have been clearly presented and the proposed solution seems very interesting. The novel contribution of the proposed model is the idea that different patterns of activity drive current action selection and learning. Not only does this allow the model is able to implement reinforcement learning computations well, but this suggestion may have interesting implications regarding why some processes selectively affect ongoing behavior and others affect learning. The model is able to recapitulate some interesting experimental findings about various activity characteristics of dSPN and iSPN pathway neuronal populations in spontaneously behaving mice. The authors also show that their proposed model can implement off-policy reinforcement learning algorithms with biologically realistic learning rules. This is interesting since off-policy learning provides some unique computational benefits and it is very likely that learning in neural circuits may, at least to some extent, implement such computations.

      We thank the reviewer for the positive comments.

      Weaknesses:

      A weakness in this work is that it isn’t clear how a key component in the model - an efferent copy of selected actions - would be accessible to these striatal populations. The authors propose several plausible candidates, but future work may clarify the feasibility of this proposal.

      We agree that the biological substrate of the efference copy remains a key open question. We discuss potential pathways in the Discussion section of our manuscript and hope that future experimental studies clarify the question.

      Reviewer #2:

      Summary:

      The basal ganglia is often understood within a reinforcement learning (RL) framework, where dopamine neurons convey a reward prediction error that modulates cortico-striatal connections onto spiny projection neurons (SPNS) in the striatum. However, current models of plasticity rules are inconsistent with learning in a reinforcement learning framework.

      This paper proposes a new model that describes how distinct learning rules in direct and indirect pathway striatal neurons allow them to implement reinforcement learning models. It proposes that two distinct components of striatal activity affect action selection and learning. They show that the proposed implementation allows learning in simple tasks and is consistent with experimental data from calcium imaging data in direct and indirect SPNs in freely moving mice.

      Strengths:

      Despite the success of reward prediction errors at characterizing the responses of dopamine neurons as the temporal difference error within an RL framework, the implementation of RL algorithms in the rest of the basal ganglia has been unclear. A key missing aspect has been the lack of a RL implementation that is consistent with the distinction of direct- and indirect SPNs. This paper proposes a new model that is able to learn successfully in simple RL tasks and explains recent experimental results.

      The author shows that their proposed model, unlike previous implementations, this model can perform well in RL tasks. The new model allows them to make experimental predictions. They test some of these predictions and show that the dynamics of dSPNs and iSPNs correspond to model predictions.

      More generally, this new model can be used to understand striatal dynamics across direct and indirect SPNs in future experiments.

      We thank the reviewer for the positive comments.

      Weaknesses:

      The authors could characterize better the reliability of their experimental predictions and the description of the parameters of some of the simulations.

      In addition to the descriptions in the Methods, we have provided code implementing the key features of our simulations, which should contribute to reproducibility of our results.

      The authors propose some ideas about how the specificity of the striatal efferent inputs but should highlight better that this is a key feature of the model whose anatomical implementation has yet to be resolved.

      We have clarified in the Discussion section “Biological substrates of striatal efferent inputs” that these represent assumptions or predictions that have not yet been demonstrated experimentally.

      Reviewer #3:

      Summary:

      This paper points out an inconsistency of the roles of the striatal spiny neurons projecting to the indirect pathway (iSPN) and the synaptic plasticity rule of those neurons expressing dopamine D2 receptors and proposes a novel, intriguing mechanisms that iSPNs are activated by the efference copy of the chosen action that they are supposed to inhibit.

      The proposed model was supported by simulations and analysis of the neural recording data during spontaneous behaviors.

      Strengths:

      Previous models suggested that the striatal neurons learn action-value functions, but how the information about the chosen action is fed back to the striatum for learning was not clear. The author pointed out that this is a fundamental problem for iSPNs that are supposed to inhibit specific actions and its synaptic inputs are potentiated with dopamine dips.

      The authors propose a novel hypothesis that iSPNs are activated by efference copy of the selected action which they are supposed to inhibit during action selection. Even though intriguing and seemingly unnatural, the authors demonstrated that the model based on the hypothesis can circumvent the problem of iSPNs learning to disinhibit the actions associated with negative reward errors. They further showed by analyzing the cell-type specific neural recording data by Markowitz et al. (2018) that iSPN activities tend to be anti-correlated before and after action selection.

      We thank the reviewer for the positive comments.

      Weaknesses:

      It is not correct to call the action value learning using the externally-selected action as “offpolicy.” Both off-policy algorithm Q-learning and on-policy algorithm SARSA update the action value of the chosen action, which can be different from the greedy action implicated by the present action values. In standard reinforcement learning terminology, on-policy or off-policy is regarding the actions in the subsequent state, whether to use the next action value of (to be) chosen action or that of greedy choice as in equation (7).

      It is worth noting that this paper suggested that dopamine neurons encode on-policy TD errors: Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006). Midbrain dopamine neurons encode decisions for future action. Nat Neurosci, 9, 1057-63. https://doi.org/10.1038/nn1743.

      We regret that we do not completely follow the reviewer’s comment. We use “off-policy” to refer to the fact that, considered in isolation, the basal ganglia reinforcement learning system that we model learns a target policy that may be distinct from the behavioral policy of the organism as a whole.

      It is also confusing to contract TD learning and Q-learning, as the latter is considered as one type of TD learning. In the TD error signal by state value function (6) is dependent on the chosen action at−1 implicitly in rt and st based on the reward and state transition function.

      We agree that this was confusing. We have therefore changed the places in our paper where we intended to refer to “TD learning of a value function V (s)” to specifically mention V (s), rather than just “TD learning.”

      It is not clear why interferences of the activities for action selection and learning can be avoided, especially when actions are taken with short intervals or even temporal overlaps. How can the efference copy activation for the previous action be dissociated with the sensory cued activation for the next action selection?

      The non-interference arises from the orthogonality of the difference (action selection) and sum (efference copy) modes, as described in Figure 3. However, we agree with the reviewer that the problem of temporal credit assignment, when many actions are taken before reward feedback is obtained, is present in our model, as in any standard RL model.

      Although it may be difficult to single out the neural pathway that carries the efference copy signal to the striatum, it is desired to consider their requirements and difference possibilities. A major issue is that the time delay from actions to reward feedback can be highly variable.

      An interesting candidate is the long-latency neurons in the CM thalamus projecting to striatal cholinergic interneurons, which are activated following low-reward actions: Minamimoto T, Hori Y, Kimura M (2005). Complementary process to response bias in the centromedian nucleus of the thalamus. Science, 308, 1798-801. https://doi.org/10.1126/science.1109154.

      We are grateful for the interesting suggestion and reference, which we have added to the manuscript. However, we note that the issue of delayed reward feedback may also be partially addressed by using a sufficiently long eligibility trace.

      In the paragraph before Eq. (3), Eq. (1) should be Eq. (2) for the iSPN.

      Corrected.

    1. Author response:

      eLife Assessment

      This manuscript offers important insights into how polyphosphate (polyP) influences protein phase separation differently from DNA. The authors present compelling evidence that polyP distinguishes between protein conformational states, leading to diverse condensate behaviors. However, differences in charge density between polyP and DNA complicate direct comparisons, and the extent to which polyP-driven phase transitions reveal initial protein states remains unclear. Addressing these concerns would strengthen the manuscript's impact for researchers interested in biomolecular condensates, protein dynamics, and stress response mechanisms.

      We thank the editorial team for the favorable assessment. We, however, contend the specific point on the difference in charge density. We have already performed experiments wherein a higher concentration of DNA is used to match the overall ‘concentration of charges’ as in the experiments with polyP (see Figure S6), and we do not identify or observe any differences in the maturation behavior with DNA, i.e. we see only dissolution at both higher and lower concentrations of DNA. Charge density (i.e. the number of charges per unit volume of the polymer), on the other hand, is an intrinsic feature of the polymer which is naturally different between DNA and polyP. In fact, the primary result of our work is our observation that polyP can discern the starting ensembles more efficiently, likely through actively engaging and interacting with the ensemble while DNA appears to be a passive player. 

      Reviewer #1 (Public review):

      Summary:

      In the article titled "Polyphosphate discriminates protein conformational ensembles more efficiently than DNA promoting diverse assembly and maturation behaviors," Goyal and colleagues investigate the role of negatively charged biopolymers, i.e., polyphosphate (polyP) and DNA, play in phase separation of cytidine repressor (CytR) and fructose repressor (FruR). The authors find that both negative polymers drive the formation of metastable protein/polymer condensates. However, polyPdriven condensates form more gel- or solid-like structures over time while DNA-driven condensates tend to dissipate over time. The authors link this disparate condensate behavior to polyP-induced structures within the enzymes. Specifically, they observe the formation of polyproline II-like structures within two tested enzyme variants in the presence of polyP. Together their results provide a unique insight into the physical and structural mechanism by which two unique negatively charged polymers can induce distinct phase transitions with the same protein. This study will be a welcomed addition to the condensate field and provide new molecular insights into how binding partner-induced structural changes within a given protein can affect the mesoscale behavior of condensates. The concerns outlined below are meant to strengthen the manuscript.

      Strengths:

      Throughout the article, the authors used the correct techniques to probe physical changes within proteins that can be directly linked to phase transition behaviors. Their rigorous experiments create a clear picture of what occurs at the molecular level with CytR and FruR are exposed to either DNA or polyP, which are unique, highly negatively charged biopolymers found within bacteria. This work provides a new view of mechanisms by which bacteria can regulate the cytoplasmic organization upon the induction of stress. Furthermore, this is likely applicable to mammalian and plant cells and likely to numerous proteins that undergo condensation with nucleic acids and other charged biopolymers.

      Weaknesses:

      The biggest weakness of this study is that compares the phase behavior of enzymes driven by negatively charged polymers that have intrinsic differences in net charge and charge density. Because these properties are extremely important for controlling phase separation, any differences may result in the observed phase transitions driven by DNA and polyP. The authors should perform an additional experiment to control for these differences as best they can. The results from these experiments will provide additional insight into the importance of charge-based properties for controlling phase transitions.

      We thank the reviewer for providing a positive review of our work. On the comment related to the final paragraph, we note that we have already conducted an experiment with a higher DNA concentration (11.24 µM) to explore if the concentration of charges plays any significant role. The results of this experiment are presented in Figure S6. We observe that even at a higher DNA concentration, the condensates dissolve over time. Therefore, the difference in the maturation behavior of condensates with varying initial protein ensembles is due to the nature of polyP (likely through its enhanced flexibility). 

      Reviewer #2 (Public review):

      Summary:

      In this study, Goyal et al demonstrate that the assembly of proteins with polyphosphate into either condensates or aggregates can reveal information on the initial protein ensemble. They show that, unlike DNA, polyphosphate is able to effectively discriminate against initial protein ensembles with different conformational heterogeneity, structure, and compactness. The authors further show that the protein native ensemble is vital on whether polyphosphate induces phase separation or aggregation, whereas DNA induces a similar outcome regardless of the initial protein ensemble. This work provides a way to improve our mechanistic understanding of how conformational transitions of proteins may regulate or drive LLPS condensate and aggregate assemblies within biological systems.

      Strengths:

      This is a thoroughly conducted study that provides an alternative route for inducing phase separation that is more informative on the initial protein ensemble involved. This is particularly useful and a complementary means to investigate the role played by protein dynamics and plasticity in phase transitions. The authors use an appropriate set of techniques to investigate unique phase transitions within proteins induced by polyphosphates. An alternative protein system is used to corroborate their findings that the unique assemblies induced by polyphosphates when compared to DNA are not restricted to a single system. The work here is well-documented, easy to interpret, and of relevance for the condensate community.

      Weaknesses:

      The major weakness of this manuscript is that it is unclear if the information on the initial protein conformational ensemble can be determined solely from the assembly and maturation behavior and the discrimination abilities of polyphosphates. In both systems studied (CytR and FruR), polyphosphate discriminates and results in unique assemblies and maturation behaviors based on the initial protein ensemble. However, it seems the assembly and maturation behavior are not a direct result of the degree of conformational dynamics and plasticity in the initial protein. In the case of CytR, the fully-folded system forms condensates that resolubilize, while the highly disordered state immediately aggregates. Whereas, in the case of FruR, the folded state induces spontaneous aggregation, and the more dynamic, molten globular, system results in short-lived condensates. These results seem to suggest the polyphosphates' ability to discriminate between the initial protein ensemble may not be able to reveal what that initial protein ensemble is unless it is already known.

      We thank the reviewer for providing constructive comments on our work. On the final paragraph: we agree that the outcome does not provide information on nature of the starting ensemble. As of now, our experimental results are primarily observations on questions related to maturation outcomes when protein ensembles of varying structure, compactness and stability interact with polyP. if there are differences in the native ensemble due to mutations (which at times cannot be revealed by ensemble probes), polyP appears to discern it more efficiently than DNA.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study aimed to investigate the effects of optically stimulating the A13 region in healthy mice and a unilateral 6-OHDA mouse model of Parkinson's disease (PD). The primary objectives were to assess changes in locomotion, motor behaviors, and the neural connectome. For this, the authors examined the dopaminergic loss induced by 6-OHDA lesioning. They found a significant loss of tyrosine hydroxylase (TH+) neurons in the substantia nigra pars compacta (SNc) while the dopaminergic cells in the A13 region were largely preserved. Then, they optically stimulated the A13 region using a viral vector to deliver the channelrhodopsine (CamKII promoter). In both sham and PD model mice, optogenetic stimulation of the A13 region induced pro-locomotor effects, including increased locomotion, more locomotion bouts, longer durations of locomotion, and higher movement speeds. Additionally, PD model mice exhibited increased ipsi lesional turning during A13 region photoactivation. Lastly, the authors used whole-brain imaging to explore changes in the A13 region's connectome after 6-OHDA lesions. These alterations involved a complex rewiring of neural circuits, impacting both afferent and efferent projections. In summary, this study unveiled the pro-locomotor effects of A13 region photoactivation in both healthy and PD model mice. The study also indicates the preservation of A13 dopaminergic cells and the anatomical changes in neural circuitry following PD-like lesions that represent the anatomical substrate for a parallel motor pathway.

      Strengths:

      These findings hold significant relevance for the field of motor control, providing valuable insights into the organization of the motor system in mammals. Additionally, they offer potential avenues for addressing motor deficits in Parkinson's disease (PD). The study fills a crucial knowledge gap, underscoring its importance, and the results bolster its clinical relevance and overall strength.

      The authors adeptly set the stage for their research by framing the central questions in the introduction, and they provide thoughtful interpretations of the data in the discussion section. The results section, while straightforward, effectively supports the study's primary conclusion - the pro-locomotor effects of A13 region stimulation, both in normal motor control and in the 6-OHDA model of brain damage.

      We thank the reviewer for their positive comments.

      Weaknesses:

      (1) Anatomical investigation. I have a major concern regarding the anatomical investigation of plastic changes in the A13 connectome (Figures 4 and 5). While the methodology employed to assess the connectome is technically advanced and powerful, the results lack mechanistic insight at the cell or circuit level into the pro-locomotor effects of A13 region stimulation in both physiological and pathological conditions. This concern is exacerbated by a textual description of results that doesn't pinpoint precise brain areas or subareas but instead references large brain portions like the cortical plate, making it challenging to discern the implications for A13 stimulation. Lastly, the study is generally well-written with a smooth and straightforward style, but the connectome section presents challenges in readability and comprehension. The presentation of results, particularly the correlation matrices and correlation strength, doesn't facilitate biological understanding. It would be beneficial to explore specific pathways responsible for driving the locomotor effects of A13 stimulation, including examining the strength of connections to well-known locomotor-associated regions like the Pedunculopontine nucleus, Cuneiformis nucleus, LPGi, and others in the diencephalon, midbrain, pons, and medulla.

      We initially considered two approaches. The first was to look at specific projections to the motor regions, focusing on the MLR. The second was to utilize a whole-brain analysis, which is presented here. Given what we know about the zona incerta, especially its integrative role, we felt that examining the full connectome was a reasonable starting point.

      The value of the whole-brain approach is that it provides a high-level overview of the afferents and efferents to the region. The changes in the brain that occur following Parkinson-like lesions, such as those in the nigrostriatal pathway, are complex and can affect neighbouring regions such as the A13. Therefore, we wished to highlight the A13, which we considered a therapeutic target, and examine changes in connectivity that could occur following acute lesions affecting the SNc. We acknowledge that this study does not provide a causal link, but it presents the fundamental background information for subsequent hypothesis-driven, focused, region-specific analysis.

      The terms provided were taken from the Allen Brain Atlas terminology and presented as abbreviations. We have added two new figures focusing on motor regions to make the information more comprehensible (new Figures 4 and 5) and rewrote the connectomics section to make it easier to understand.

      Additionally, identifying the primary inputs to A13 associated with motor function would enhance the study's clarity and relevance.

      This is a great point to help simplify the whole-brain results. We have presented the motor-related inputs and outputs as part of a new figure in the main paper (Figure 5) and added accompanying text in the results section. We have also updated the correlation matrices to concentrate on motor regions (Figure 4). This highlights possible therapeutic pathways. We have also enhanced our discussion of these motor-related pathways. We have retained the entire dataset and added it to our data repository for those interested.

      The study raises intriguing questions about compensatory mechanisms in Parkinson's disease and a new perspective on the preservation of dopaminergic cells in A13, despite the SNc degeneration, and the plastic changes to input/output matrices. To gain inspiration for a more straightforward reanalysis and discussion of the results, I recommend the authors refer to the paper titled "Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon from the David Kleinfeld laboratory." This could guide the authors in investigating motor pathways across different brain regions.

      Thank you for the advice. As pointed out, Kleinfeld’s group presented their data in a nice, focused way. For the connectomic piece, we have added Figure 5, which provides a better representation than our previous submission.

      (2) Description of locomotor performance. Figure 3 provides valuable data on the locomotor effects of A13 region photoactivation in both control and 6-OHDA mice. However, a more detailed analysis of the changes in locomotion during stimulation would enhance our understanding of the pro-locomotor effects, especially in the context of 6-OHDA lesions. For example, it would be informative to explore whether the probability of locomotion changes during stimulation in the control and 6-OHDA groups. Investigating reaction time, speed, total distance, and could reveal how A13 is influencing locomotion, particularly after 6-OHDA lesions. The laboratory of Whelan has a deep knowledge of locomotion and the neural circuits driving it so these features may be instructive to infer insights on the neural circuits driving movement. On the same line, examining features like the frequency or power of stimulation related to walking patterns may help elucidate whether A13 is engaging with the Mesencephalic Locomotor Region (MLR) to drive the pro-locomotor effects. These insights would provide a more comprehensive understanding of the mechanisms underlying A13-mediated locomotor changes in both healthy and pathological conditions.

      Thank you for these suggestions. We have reorganized Figure 3 to highlight the metrics by separating the 6-OHDA from the Sham experiments (3F-J, which highlights distance travelled, average speed and duration). We have also added additional text to highlight these metrics better in the text. We have relabelled Supplementary Figure S3, which presents reaction time as latency to initiate locomotion and updated the main text to address the reviewers' points.

      Reviewer #2 (Public Review):

      Summary:

      The paper by Kim et al. investigates the potential of stimulating the dopaminergic A13 region to promote locomotor restoration in a Parkinson's mouse model. Using wild-type mice, 6-OHDA injection depletes dopaminergic neurons in the substantia nigra pars compacta, without impairing those of the A13 region and the ventral tegmentum area, as previously reported in humans. Moreover, photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region improves bradykinesia and akinetic symptoms after 6-OHDA injection. Whole-brain imaging with retrograde and anterograde tracers reveals that the A13 region undergoes substantial changes in the distribution of its afferents and projections after 6-OHDA injection. The study suggests that if the remodeling of the A13 region connectome does not promote recovery following chronic dopaminergic depletion, photostimulation of the A13 region restores locomotor functions.

      Strengths:

      Photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region promotes locomotion and locomotor recovery of wild-type mice 1 month after 6-OHDA injection in the medial forebrain bundle, thus identifying a new potential target for restoring motor functions in Parkinson's disease patients.

      Weaknesses:

      Electrical stimulation of the medial Zona Incerta, in which the A13 region is located, has been previously reported to promote locomotion (Grossman et al., 1958). Recent mouse studies have shown that if optogenetic or chemogenetic stimulation of GABAergic neurons of the Zona Incerta promotes and restores locomotor functions after 6-OHDA injection (Chen et al., 2023), stimulation of glutamatergic ZI neurons worsens motor symptoms after 6-OHDA (Lie et al., 2022).

      Thank you - we have added this reference. It is helpful as Grossman did stimulate the zona incerta in the cat and elicit locomotion, suggesting that stimulation of the area in normal mice has external validity. Grossman’s results prompted a later clinical examination of the zona incerta, but it concentrated on the zona incerta regions close to the subthalamic regions (Ossowska 2019), further caudal to the area we focused on. Chen et al. (2023) targeted the area in the lateral aspect of central/medial zona incerta, formed by dorsal and ventral zona incerta, which may account for the differing results. Our data were robust for stimulation of the medial aspect of the rostromedial zona incerta. The thigmotactic behaviour that we observed in our work that focused on CamKII neurons has not been observed with chemogenetic, optogenetic activation or with photoinhibition of GABAergic central/medial ZI (Chen et al. 2023).

      GABAergic activation of mZI to Cuneiform projections (Sharma et al. 2024) also did not produce thigmotactic behavior. We have added these points to the discussion.

      Although CAMKIIa is a marker of presumably excitatory neurons and can be used as an alternative marker of dopaminergic neurons, behavioral results of this study raise questions about the neuronal population targeted in the vicinity of the A13 region. Moreover, if YFP and CHR2-YFP neurons express dopamine (TH) within the A13 region (Fig. 2), there is also a large population of transduced neurons within and outside of the A13 region that do not, thus suggesting the recruitment of other neuronal cell types that could be GABAergic or glutamatergic.

      We found that CamKII transfection of the A13 region was extremely effective in promoting locomotor activity, which was critical for our work in exploring its possible therapeutic potential. We have since quantified the cell number, we found that the c-fos cell number was increased following ChR2 activation. There is evidence of TH activation - but the data suggest that other cell types contribute. C-fos alone is a blunt tool to assess specificity - rather, it is better at showing overall photostimulus efficacy - which we have demonstrated. Moreover, there is evidence that cell types are not purely dopaminergic, with GABA co-localized (Negishi et al. 2020). We acknowledge that specific viral approaches that target the GABAergic, glutamatergic, and dopaminergic circuits would be very useful. The range of tools to target A13 dopaminergic circuits is more limited than the SNc, for example, because the A13 region lacks DAT, and TH-IRES-Cre approaches, while helpful, are less specific than DAT-Cre mouse models. Intersectional approaches targeting multiple transmitters (glutamate & dopamine, for example) may be one solution as we do not expect that a single transmitter-specific pathway would work, as well as broad targeting of the A13 region. Our recent work suggests that GABAergic neuron activation may have more general effects on behaviour rather than control of ongoing locomotor parameters (Sharma et al. 2024). Recent work shows a positive valence effect of dopamine A13 activation on motivated food-seeking behavior, which differs from consummatory behavior observed with GABAergic modulation (Ye, Nunez, and Zhang 2023). Chemogenetic inactivation and ablation of dopaminergic A13 revealed that they contribute to grip strength and prehensile movements, uncoupling food-seeking grasping behavior from motivational factors (Garau et al. 2023). Overall, this suggests differing effects of GABA compared to DA and/or glutamatergic cell types, consistent with our effects of stimulating CamKII. The discussion has been updated.

      Regarding the analysis of interregional connectivity of the A13 region, there is a lack of specificity (the viral approach did not specifically target the A13 region), the number of mice is low for such correlation analyses (2 sham and 3 6-OHDA mice), and there are no statistics comparing 6-OHDA versus sham (Fig. 4) or contra- versus ipsilesional sides (Fig. 5). Moreover, the data are too processed, and the color matrices (Fig. 4) are too packed in the current format to enable proper visualization of the data. The A13 afferents/efferents analysis is based on normalized relative values; absolute values should also be presented to support the claim about their upregulation or downregulation.

      Generally, papers using tissue-clearing imaging approaches have low sample sizes due to technical complexity and challenges. The technical challenges of obtaining these data were substantial in both collection and analysis. There are multiple technical complexities arising from dual injections (A13 and MFB coordinates) and targeting the area correctly. The A13 region is difficult to target as it spans only around 300 µm in the anterior-posterior axis. While clearing the brain takes weeks, and light-sheet imaging also takes time, the time necessary to analyze the tissue using whole-brain quantification is labor intensive, especially with a lack of a standardized analysis pipeline from atlas registrations, signal segmentations, and quantifications. The field is still relatively new, requiring additional time to refine pipelines.

      Correlation matrices are often used in analyzing connectivity patterns on a brain-wide scale, as they can identify any observable patterns within a large amount of data. We used correlation matrices to display estimated correlation coefficients between the afferent and efferent proportions from one brain subregion to another across 251 brain regions in total in a pairwise manner (not for hypothesis testing). We provided descriptive statistics (mean and error bars) in the original Figure 5C and G. As mentioned in comments for Reviewer 1, we have now presented the data in revised Figure 4 and 5 that focuses specifically on motor-related pathways to provide information on possible pathways. The has simplified the correlation matrices and highlighted the differences in 6-OHDA efferent data especially. As suggested, raw values are shared in a supplemental file on our data repository.

      In the absence of changes in the number of dopaminergic A13 neurons after 6-OHDA injection, results from this correlation analysis are difficult to interpret as they might reflect changes from various impaired brain regions independently of the A13 region.

      We acknowledge that models of Parkinson’s disease, particularly those using 6-OHDA, induce plasticity in various regions, which may subsequently affect A13 connectivity. We aim to emphasize the residual, intact A13 pathways that could serve as therapeutic targets in future investigations. This emphasis is pertinent in the context of potential clinical applications, as the overall input and output to the region fundamentally dictate the significance of the A13 region in lesioned nigrostriatal models. We agree with the reviewer that the changes certainly can be independent of A13; however, the fact that there was a significant change in the connectome post-6-OHDA injection and striatonigral degeneration is in and of itself important to document. We have added a sentence acknowledging this limitation to the discussion.

      There is no causal link between anatomical and behavioral data, which raises questions about the relevance of the anatomical data.

      This point was also addressed earlier in response to a comment from Reviewer 1. Focusing on specific motor pathways is one avenue to explore. However, given that the zona incerta acts as an integrative hub, we believed it is prudent to initially examine both afferent and efferent pathways using a brain-wide approach. For instance, without employing this methodology, the potential significance of cortical interconnectivity to the A13 region might not have been fully appreciated. As mentioned previously, we will place additional emphasis on motor-related regions in our revised paper, thereby enhancing the relevance of the anatomical data presented. With these modifications, we anticipate that our data will underscore specific motor-related targets for future exploration, employing optogenetic targeting to assess necessity and sufficiency.

      Overall, the study does not take advantage of genetic tools accessible in the mouse to address the direct or indirect behavioral and anatomical contributions of the A13 region to motor control and recovery after 6-OHDA injection.

      Our study has not specifically targeted neurons that express dopaminergic, glutamatergic, or GABAergic properties (refer to earlier comment for more detail). However, like others, we find that targeting one neuronal population often does not result in a pure transmitter phenotype. For instance, evidence suggests co-localization of dopamine neurons with a subpopulation of GABA neurons in the A13/medial zona incerta (Negishi et al. 2020). In the hypothalamus, research by Deisseroth and colleagues (Romanov et al. 2017) indicates the presence of multiple classes of dopamine cells, each containing different ratios of co-localized peptides and/or fast neurotransmitters. Consequently, we believe our work lays the foundation for the investigations suggested by the reviewer. Furthermore, if one considers this work in the context of a preclinical study to determine whether the A13 might be a target in human Parkinson's disease, the existing technology that could be utilized is deep brain stimulation (DBS) or electrical modulation, which would also affect different neuronal populations in a non-specific manner.

      While optogenetic stimulation therapy is longer term, using CamKII combined with the DJ hybrid AAV could be a translatable strategy for targeting A13 neuronal populations in non-human primates (Watakabe et al. 2015; Watanabe et al. 2020). We have added to the discussion.

      Reviewer #3 (Public Review):

      Kim, Lognon et al. present an important finding on pro-locomotor effects of optogenetic activation of the A13 region, which they identify as a dopamine-containing area of the medial zona incerta that undergoes profound remodeling in terms of afferent and efferent connectivity after administration of 6-OHDA to the MFB. The authors claim to address a model of PD-related gait dysfunction, a contentious problem that can be difficult to treat with dopaminergic medication or DBS in conventional targets. They make use of an impressive array of technologies to gain insight into the role of A13 remodeling in the 6-OHDA model of PD. The evidence provided is solid and the paper is well written, but there are several general issues that reduce the value of the paper in its current form, and a number of specific, more minor ones. Also, some suggestions, that may improve the paper compared to its recent form, come to mind.

      Thank you for the suggestions and careful consideration of our work - it is appreciated.

      The most fundamental issue that needs to be addressed is the relation of the structural to the behavioral findings. It would be very interesting to see whether the structural heterogeneity in afferent/effects projections induced by 6-OHDA is related to the degree of symptom severity and motor improvement during A13 stimulation.

      As mentioned in comments for Reviewer 1, we have performed additional analysis and present this in Figure 5. We have also revised Figure 4, focusing on motor regions. Our work will provide a roadmap for future studies to disentangle divergent or convergent A13 pathways that are involved in different or all PD-related motor symptoms. Because we could not measure behavioural change in the same animals studied with the anatomic study (essentially because the optrode would have significantly disrupted the connectome we are measuring), we cannot directly compare behaviour to structure.

      The authors provide extensive interrogation of large-scale changes in the organization of the A13 region afferent and efferent distributions. It remains unclear how many animals were included to produce Fig 4 and 5. Fig S5 suggests that only 3 animals were used, is that correct? Please provide details about the heterogeneity between animals. Please provide a table detailing how many animals were used for which experiment. Were the same animals used for several experiments?

      The behavioral set and the anatomical set were necessarily distinct. In the anatomical experiments, we employed both anterograde and retrograde viral approaches to target the afferent and efferent A13 populations with fluorescent proteins. For the behavioral approach, a single ChR2 opsin was utilized to photostimulate the A13 region; hence combining the two populations was not feasible. We were also concerned that the optrode itself would interfere with connectomics. A lower number of animals were used for the whole-brain work due to technical limitations described earlier. We have now provided additional information regarding numbers in all figures and the text. Using Spearman’s correlation analysis, we found afferent and efferent proportions across animals to be consistent, with an average correlation of 0.91, which is reported in Figure S6.

      While the authors provide evidence that photoactivation of the A13 is sufficient in driving locomotion in the OFT, this pro-locomotor effect seems to be independent of 6-OHDA-induced pathophysiology. Only in the pole test do they find that there seems to be a difference between Sham vs 6-OHDA concerning the effects of photoactivation of the A13. Because of these behavioral findings, optogenic activation of A13 may represent a gain of function rather than disease-specific rescue. This needs to be highlighted more explicitly in the title, abstract, and conclusion.

      Optogenetic activation of A13 may represent a gain of function in both healthy and 6-OHDA mice, highlighting a parallel descending motor pathway that remains intact. 6-OHDA lesions have multiple effects on motor and cognitive function. This makes a single pathway unlikely to rescue all deficits observed in 6-OHDA models. The lack of locomotion observed in 6-OHDA models can be reversed by A13 region photostimulation. Therefore, this is a reversal of a loss of function, in this case. However, the increase in turning represents a gain of function. We have highlighted this as suggested in the discussion.

      The authors claim that A13 may be a possible target for DBS to treat gait dysfunction. However, the experimental evidence provided (in particular the lack of disease-specific changes in the OFT) seems insufficient to draw such conclusions. It needs to be highlighted that optogenetic activation does not necessarily have the same effects as DBS (see the recent review from Neumann et al. in Brain: https://pubmed.ncbi.nlm.nih.gov/37450573/). This is important because ZI-DBS so far had very mixed clinical effects. The authors should provide plausible reasons for these discrepancies. Is cell-specificity, which only optogenetic interventions can achieve, necessary? Can new forms of cyclic burst DBS achieve similar specificity (Spix et al, Science 2021)? Please comment.

      Thank you for the valuable comments. They have been incorporated into the discussion.

      Our study highlights a parallel motor pathway provided by the A13 region that remains intact in 6-OHDA mice and can be sufficiently driven to rescue the hypolocomotor pathology observed in the OFT and overcome bradykinesia and akinesia. The photoactivation of ipsilesional A13 also has an overall additive effect on ipsiversive circling, representing a gain of function on the intact side that contributes to the magnitude of overall motor asymmetry against the lesioned side. The effects of DBS are rather complex, ranging from micro-, meso-, to macro-scales, involving activation, inhibition, and informational lesioning, and network interactions. This could contribute to the mixed clinical effects observed with ZI-DBS, in addition to differences in targeting and DBS programming among the studies (see review (Ossowska 2019) ). Also the DBS studies targeting ZI have never targeted the rostromedial ZI which extends towards the hypothalamus and contains the A13. Furthermore, DBS and electrical stimulation of neural tissue, in general, are always limited by current spread and lower thresholds of activation of axons (e.g., axons of passage), both of which can reduce the specificity of the true therapeutic target. Optogenetic studies have provided mechanistic insights that could be leveraged in overcoming some of the limitations in targeting with conventional DBS approaches. Spix et al. (2021) provided an interesting approach highlighting these advancements. They devised burst stimulation to facilitate population-specific neuromodulation within the external globus pallidus. Moreover, they found a complementary role for optogenetics in exploring the pathway-specific activation of neurons activated by DBS. To ascertain whether A13 DBS may be a viable therapy for PD gait, it will be necessary to perform many more preclinical experiments, and tuning of DBS parameters could be facilitated by optogenetic stimulation in these murine models. We have added to the discussion.

      In a recent study, Jeon et al (Topographic connectivity and cellular profiling reveal detailed input pathways and functionally distinct cell types in the subthalamic nucleus, 2022, Cell Reports) provided evidence on the topographically graded organization of STN afferents and McElvain et al. (Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon, 2021, Neuron) have shown similar topographical resolution for SNr efferents. Can a similar topographical organization of efferents and afferents be derived for the A13/ ZI in total?

      The ZI can be subdivided into four subregions in the antero-posterior axis: rostral (ZIr), dorsal (ZId), ventral (ZIv), and caudal (ZIc) regions. The dorsal and ventral ZI is also referred together as central/medial/intermediate ZI. There are topographical gradients in different cell types and connectivity across these subregions (see reviews: (Mitrofanis 2005; Monosov et al. 2022; Ossowska 2019). Recent work by Yang and colleagues (2022) demonstrated a topographical organization among the inputs and outputs of GABAergic (VGAT) populations across four ZI subregions. Given that A13 region encompasses a smaller portion (the medial aspect) of both rostral and medial/central ZI (three of four ZI subregions) and coexpress VGAT, A13 region likely falls under rostral and intermediate medial ZI dataset found in Yang et al. (2022). With our data, we would not be able to capture the breadth of topographical organization shown in Yang et al (2022).

      In conclusion, this is an interesting study that can be improved by taking into consideration the points mentioned above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 2 indeed presents valuable information regarding the effects of A13 region photoactivation. To enhance the comprehensiveness of this figure and gain a deeper understanding of the neurons driving the pro-locomotor effect of stimulation, it would be beneficial to include quantifications of various cell types:

      • cFos-Positive Cells/TH-Positive Cells: it can help determine the impact of A13 stimulation on dopaminergic neurons and the associated pro-locomotor effect in the healthy condition and especially in the context of Parkinson's disease (PD) modeling.

      • cFos-Positive Cells /TH-Negative Cells: Investigating the number of TH-negative cells activated by stimulation is also important, as it may reveal non-dopaminergic neurons that play a role in locomotor responses. Identifying the location and characteristics of these TH-negative cells can provide insights into their functional significance.

      We have completed this analysis. The data is presented in Figure 2F, where we show increased c-fos intensity with photoactivation. We observed an increase in the number of cells activated in the A13 region. However, we did not definitively see increases in TH+ cells, suggesting a heterogeneous set of neurons responsible for the effects—possibly glutamatergic neurons.

      Incorporating these quantifications into Figure 2 would enhance the figure's informativeness and provide a more comprehensive view of the neuronal populations involved in the locomotor effects of A13 stimulation.

      We have added text and a new graph.

      (2) Refer to Figure 3. In the main text (page 5) when describing the animal with 6-OHDA the wrong panels are indicated. It is indicated in Figure 2A-E but it should be replaced with 3A-E.

      Please do that.

      Done, and we have updated the figure to improve readability, by separating the 6-OHDA findings from sham in all graphs.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Page 1: Inhibitory or lesion studies will be necessary to support the claim that the global remodeling of afferent and efferent projections of the A13 region highlights the Zona Incerta's role as a crucial hub for the rapid selection of motor function.

      Overall, there is quite a bit of evidence that the zona incerta is a hub for afferent/efferents.

      Mitrofanis (2005) and, more recently, Wang et al. (2020) summarize some of the evidence. Yang (2022) illustrates that the zona incerta shows multiple inputs to GABAergic neurons and outputs to diverse regions. Recent work suggests that the zona incerta contributes to various motor functions such as hunting, exploratory locomotion, and integrating multiple modalities (Zhao et al. 2019; Wang et al. 2019; Monosov et al. 2022; Chometton et al. 2017). The introduction has been updated.

      Introduction

      Page 2, paragraph 2: "However, little attention has been placed on the medial zona incerta (mZI), particularly the A13, the only dopamine-containing region of the rostral ZI" Is the A13 region located in the rostral or medial ZI or both?

      It should have been written “rostromedial” ZI. The A13 is located in the medial aspect of rostromedial ZI. Introduction has been updated.

      Page 2, para 3: Li et al (2021) used a mini-endoscope to record the GCaMP6 signal. Masini and Kiehn, 2022 transiently blocked the dopaminergic transmission; they never used 6-OHDA.

      Please correct through the text.

      Corrected.

      Page 2, para 4: the A13 connectome encompasses the cerebral cortex,... MLR. The MLR is a functional region, correct this for the CNF and PPN.

      Corrected.

      Page 3, the last paragraph of the introduction could be clarified by presenting the behavioral data first, followed by the anatomy.

      This has been corrected

      Figure 1 is nice and clear, and well summarizes the experimental design.

      Thank you.

      Figure 2 shows an example of the extent of the ChR2-YFP expression and the position of an optical fiber tip above the dopaminergic A13 region from a mouse. Without any quantification, these images could be included in Figure 1. Despite a very small volume (36.8nL) of AAV, the extent of ChR2-YFP expression is quite large and includes dopaminergic and unidentified neurons within the A13 region but also a large population of unidentified neurons outside of it, thus raising questions about the volume and the types of neurons recruited.

      This is an important consideration. The issue of viral spread is complex and depends on factors including tissue type, serotype, and promotor of the virus. Li et al. (2021), for example, used different virus serotypes and promotors, injecting 150nL, whereas we used AAV DJ, injecting 36.8nL. AAV-DJ is a hybrid viral type consisting of multiple serotypes. It has a high transduction efficiency, which leads to greater gene delivery than single-serotype AAV viral constructs (Mao et al. 2016). A secondary consideration regarding translation was that AAV-DJ could effectively transduce non-primate neurons (Watanabe et al. 2020). We have addressed the issue of neurons recruited earlier, provided c-Fos quantification, and provided a new supplementary figure showing viral spread (Figure S1).

      Anatomical reconstruction of the extent of the ChR2-YFP expression and the location of the tip of the optical fiber will be necessary to confirm that ChR2-YFP expression was restricted to the A13 region.

      We will provide additional information regarding viral spread, ferrule tip placement, and c-fos cell counts. This has been done in Figure 2 and we also present a new Figure S1 where we have quantified the viral spread.

      Page 5, 1st para: Double-check the references, as not all of them are 6-OHDA injections in the MLF.

      Corrected. Removed Kiehn reference.

      Page 5, 1st para, 4th line: Replace ferrule with optical canula or fiber.

      Done

      Page 5, 1st para, 9th line: Replace Figure 2 with Figure 3.

      Done

      Page 5, 2nd para: About the refractory decrease in traveled distance by sham-ChR2 mice: is this significant?

      It was not significant (Figure S1C, 1-way RM ANOVA: F5,25 = 0.486, P \= 0.783). This has been updated in the text.

      Figure 3 showing behavioral assessments is nice, but the stats are not always clear. In Fig 3A, are each of the off and on boxes 1 minute long? The figure legend states the test lasts 1 min, but isn't it 4 minutes? In Figure 3B-E and 3J-M, what are the differences? Do the stats identify a significant difference only during the stimulation phase? Fig. 3F-I are nice and could have been presented as primary examples prior to data analysis in Fig. 3B-E. Group labels above the graph would help.

      Yes, the off-on boxes are 1 minute long. The error is corrected in the legend. Great suggestion for F-I - they have been moved ahead of the summary figures. We have also updated new Fig 3F-,I, J, L, M) to make the differences between 6-OHDA and sham graphs easier to visualize. The stats do indicate a significant difference during the stimulation phase. We have added group labels, and reorganized the figure, and it is much easier to read now.

      Fig. 3L-M, what do PreSur, Post, and Ferrule mean? I assume that Ferrule refers to mice tested with the optical fiber without stimulation, whereas Stim. refers to the stimulation. It would be helpful to standardize the format of stats in Fig. 3B-E and 3-J-M. What are time points a, b, and c referring to?

      We have renamed the figure names to be more intuitive. We have standardized the presentation of statistics in the figure, and eliminated the a,b,c nomenclature. We have also updated the caption to provide descriptions of the tests in Fig 3 L-M.

      Figure S2A: the higher variability in 6-OHDA-YFP mice in comparison to 6-OHDA-ChR2 mice prior to stimulation suggests that 6-OHDA-YFP mice were less impaired. Why use boxplots only for these data? Would a pairwise comparison be more appropriate?

      We have removed these plots from Figure S2. We now present the Baseline to Pre values across the experimental timespan to illustrate the fact that distance travelled returned to baseline values for all trials conducted.

      Fig. S2B: add the statistical marker.

      We have removed this from Figure S2.

      Page 7, para 1, line 8: to add "in comparison to 6-OHDA-YFP and YFP mice" to during photostimulation... (Figure 3E).

      Done

      Page 7, para 3, line 5: about larger improvement, replace "sham ChR2" with "6-OHDA."

      Done

      Page 8, para 1, line 4: Perier et al., 2000 reported that 6-OHDA injection increased the firing frequency of the ZI over a month.

      Added the timeframe to this sentence.

      Page 8, para 2, line 1: Since the results were expected, add some references.

      Done.

      Page 8, para 3, line 4. Double-check the reference.

      Corrected.

      Page 8: About large-scale changes in the A13 region, the relevance of correlation matrices is difficult to grasp. Analysis of local connectivity would have been more informative in the context of GABAergic and glutamatergic neurons of the ZI in the vicinity of the A13 region.

      We have updated the figures for connectivity throughout the manuscript. Overall, there are new Figures 4 and 5 in the main text. We also provide a revised Supplementary Figure 8. Unfortunately, we could not do that experiment regarding local connectivity. In light of our new work (Sharma et al. 2024), it is clear that this will be critical going forward.

      Page 8, para 3, line: given Fig. 2, there is concern about the claim that only the A13 region was targeted. The time of the analysis after 6-OHDA should be mentioned. Some sections of the paragraph could be moved to methods.

      We have provided more information about the viral spread in the text and Supplementary Figure 1. The functional and anatomical experiments are separate, which we realize caused confusion. We have mentioned analysis time after 6-OHDA and inserted this into the text.

      Fig. 4: The color code helps the reader visualize distribution differences. However, statistical analyses comparing 6-OHDA versus sham should be included. Quantification per region would greatly help readers visualize the data and support the conclusion. The relationship between the type of correlation (positive or negative) and absolute change (increase or decrease) is unknown in the current format, which limits the interpretation of the data. Moreover, examples of raw images of axons and cells should be presented for several brain regions. The experimental design with a timeline, as in Fig. 1, would be helpful. The legend for Fig. 4 is a bit long. Some sections are very descriptive, whereas others are more interpretive.

      We have provided a new Figure 5 where we present quantification per region, and the correlation matrices have been updated in Figure 4. We have also focused on motor regions as mentioned earlier. We also provide examples of raw regions in Supplementary Figure 8. Raw values are shared on our data repository.

      Page 10, para 1, line 1: add "afferent" to "changes in -afferent and- projection patterns."

      Done

      Page 10, para 1, line 9: remove the 2nd "compared to sham" in the sentence.

      Done

      Page 10, para 1, line 10: remove "coordinated" in "several regions showed a coordinated reduction in afferent density." We cannot say anything about the timing of events, as there is only info at 1 month.

      Done

      Page 10, para 2: the section should be written in the past tense.

      Done

      Page 13, para 2, the last sentence is overstated. Please remove "cells" and refer to the A13 region instead.

      Done

      About differential remodelling of the A13 region connectome: Figure 5C and 5G: The proportion of total afferents ipsi- and contralateral to 6-OHDA injection argues that the A13 region primarily receives inputs from the cortical plate and the striatum. Unfortunately, there are no statistics.

      Due to the small sample size, we provided descriptive statistics (mean and error bars) in Figure 5A. As mentioned in comments for Reviewers 1 and 2, we have revised Figure 5 to present data focusing on motor-related pathways to provide clarity. In addition, absolute values are shared on our data repository.

      Figure 5 D and 5H: Changes in the proportion of total afferents/projections are relatively modest (less than 10% of the whole population for the highest changes). There is no standard deviation for these data and no statistics. Do they reflect real changes or variability from the injection site?

      The changes are relatively modest (less than 10%) since a small brain region usually provides a small proportion of total input (McElvain et al. 2021; Yang et al. 2022). The changes in the proportions reflect real differences between average proportions observed in sham and 6-OHDA mice. The variability in the total labelling of neurons and fibers was minimized by normalizing individual regional counts against total counts found in each animal. This figure has been updated as reviewers requested.

      Fig 5F and H: The example in F shows a huge decrease in the striatum, but H indicates only a 2% change, which makes the example not very representative. Absolute values would be helpful.

      While a 2% change may seem small, it represents a relatively large change in the A13 efferent connectome. To provide further clarity, we have provided absolute values as suggested in our new supplemental table.

      Figure 6 is inaccurate and unnecessary.

      Figure 6 has been removed.

      Discussion

      Although interesting, the discussion is too long.

      The discussion has been reduced by about three quarters of a page.

      Methods

      Page 17, para 1: include the stereotaxic coordinates of the optical cannula above the A13 region.

      Added.

      References

      Chen, Fenghua, Junliang Qian, Zhongkai Cao, Ang Li, Juntao Cui, Limin Shi, and Junxia Xie. 2023. “Chemogenetic and Optogenetic Stimulation of Zona Incerta GABAergic Neurons Ameliorates Motor Impairment in Parkinson’s Disease.” i Science 26 (7). https://doi.org/ 10.1016/j.isci.2023.107149.

      Chometton, S., K. Charrière, L. Bayer, C. Houdayer, G. Franchi, F. Poncet, D. Fellmann, and P. Y. Risold. 2017. “The Rostromedial Zona Incerta Is Involved in Attentional Processes While Adjacent LHA Responds to Arousal: C-Fos and Anatomical Evidence.” Brain Structure & Function 222 (6): 2507–25.

      Garau, Celia, Jessica Hayes, Giulia Chiacchierini, James E. McCutcheon, and John Apergis-Schoute. 2023. “Involvement of A13 Dopaminergic Neurons in Prehensile Movements but Not Reward in the Rat.” Current Biology: CB, October.

      https://doi.org/ 10.1016/j.cub.2023.09.044.

      Li, Zhuoliang, Giorgio Rizzi, and Kelly R. Tan. 2021. “Zona Incerta Subpopulations Differentially Encode and Modulate Anxiety.” Science Advances 7 (37): eabf6709.

      Mao, Yingying, Xuejun Wang, Renhe Yan, Wei Hu, Andrew Li, Shengqi Wang, and Hongwei Li. 2016. “Single Point Mutation in Adeno-Associated Viral Vectors -DJ Capsid Leads to Improvement for Gene Delivery in Vivo.” BMC Biotechnology 16 (January):1.

      McElvain, Lauren E., Yuncong Chen, Jeffrey D. Moore, G. Stefano Brigidi, Brenda L. Bloodgood, Byung Kook Lim, Rui M. Costa, and David Kleinfeld. 2021. “Specific Populations of Basal Ganglia Output Neurons Target Distinct Brain Stem Areas While Collateralizing throughout the Diencephalon.” Neuron 109 (10): 1721–38.e4.

      Mitrofanis, J. 2005. “Some Certainty for the ‘Zone of Uncertainty’? Exploring the Function of the Zona Incerta.” Neuroscience 130 (1): 1–15.

      Monosov, Ilya E., Takaya Ogasawara, Suzanne N. Haber, J. Alexander Heimel, and Mehran Ahmadlou. 2022. “The Zona Incerta in Control of Novelty Seeking and Investigation across Species.” Current Opinion in Neurobiology 77 (December):102650.

      Negishi, Kenichiro, Mikayla A. Payant, Kayla S. Schumacker, Gabor Wittmann, Rebecca M.  Butler, Ronald M. Lechan, Harry W. M. Steinbusch, Arshad M. Khan, and Melissa J. Chee. 2020. “Distributions of Hypothalamic Neuron Populations Coexpressing Tyrosine Hydroxylase and the Vesicular GABA Transporter in the Mouse.” The Journal of Comparative Neurology 528 (11): 1833–55.

      Ossowska, Krystyna. 2019. “Zona Incerta as a Therapeutic Target in Parkinson’s Disease.” Journal of Neurology. https://doi.org/ 10.1007/s00415-019-09486-8.

      Romanov, Roman A., Amit Zeisel, Joanne Bakker, Fatima Girach, Arash Hellysaz, Raju Tomer, Alán Alpár, et al. 2017. “Molecular Interrogation of Hypothalamic Organization Reveals Distinct Dopamine Neuronal Subtypes.” Nature Neuroscience 20 (2): 176–88.

      Sharma, Sandeep, Cecilia A. Badenhorst, Donovan M. Ashby, Stephanie A. Di Vito, Michelle A. Tran, Zahra Ghavasieh, Gurleen K. Grewal, Cole R. Belway, Alexander McGirr, and Patrick J. Whelan. 2024. “Inhibitory Medial Zona Incerta Pathway Drives Exploratory Behavior by Inhibiting Glutamatergic Cuneiform Neurons.” Nature Communications 15 (1): 1160.

      Spix, Teresa A., Shruti Nanivadekar, Noelle Toong, Irene M. Kaplow, Brian R. Isett, Yazel  Goksen, Andreas R. Pfenning, and Aryn H. Gittis. 2021. “Population-Specific Neuromodulation Prolongs Therapeutic Benefits of Deep Brain Stimulation.” Science 374 (6564): 201–6.

      Wang, Xiyue, Xiaolin Chou, Bo Peng, Li Shen, Junxiang J. Huang, Li I. Zhang, and Huizhong W. Tao. 2019. “A Cross-Modality Enhancement of Defensive Flight via Parvalbumin Neurons in Zona Incerta.” eLife 8 (April). https://doi.org/ 10.7554/eLife.42728.

      Wang, Xiyue, Xiao-Lin Chou, Li I. Zhang, and Huizhong Whit Tao. 2020. “Zona Incerta: An Integrative Node for Global Behavioral Modulation.” Trends in Neurosciences 43 (2): 82–87.

      Watakabe, Akiya, Masanari Ohtsuka, Masaharu Kinoshita, Masafumi Takaji, Kaoru Isa, Hiroaki Mizukami, Keiya Ozawa, Tadashi Isa, and Tetsuo Yamamori. 2015. “Comparative Analyses of Adeno-Associated Viral Vector Serotypes 1, 2, 5, 8 and 9 in Marmoset, Mouse and Macaque Cerebral Cortex.” Neuroscience Research 93 (April):144–57.

      Watanabe, Hidenori, Hiromi Sano, Satomi Chiken, Kenta Kobayashi, Yuko Fukata, Masaki  Fukata, Hajime Mushiake, and Atsushi Nambu. 2020. “Forelimb Movements Evoked by Optogenetic Stimulation of the Macaque Motor Cortex.” Nature Communications 11 (1): 3253.

      Yang, Yang, Tao Jiang, Xueyan Jia, Jing Yuan, Xiangning Li, and Hui Gong. 2022. “Whole-Brain Connectome of GABAergic Neurons in the Mouse Zona Incerta.” Neuroscience Bulletin 38 (11): 1315–29.

      Ye, Qiying, Jeremiah Nunez, and Xiaobing Zhang. 2023. “Zona Incerta Dopamine Neurons Encode Motivational Vigor in Food Seeking.” bioRxiv: The Preprint Server for Biology, June. https://doi.org/ 10.1101/2023.06.29.547060.

      Zhao, Zheng-Dong, Zongming Chen, Xinkuan Xiang, Mengna Hu, Hengchang Xie, Xiaoning Jia, Fang Cai, et al. 2019. “Zona Incerta GABAergic Neurons Integrate Prey-Related Sensory Signals and Induce an Appetitive Drive to Promote Hunting.” Nature Neuroscience 22 (6): 921–32.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary

      In this extensive comparative study, Moreno-Borrallo and colleagues examine the relationships between plasma glucose levels, albumin glycation levels, diet and lifehistory traits across birds. Their results confirmed the expected positive relationship between plasma blood glucose level and albumin glycation rate but also provided findings that are somewhat surprising or contrast with findings of some previous studies (positive relationships between blood glucose and lifespan, or absent relationships between blood glucose and clutch mass or diet). This is the first extensive comparative analysis of glycation rates and their relationships to plasma glucose levels and life history traits in birds that is based on data collected in a single study, with blood glucose and glycation measured using unified analytical methods (except for blood glucose data for 13 species collected from a database).

      Strengths

      This is an emerging topic gaining momentum in evolutionary physiology, which makes this study a timely, novel and important contribution. The study is based on a novel data set collected by the authors from 88 bird species (67 in captivity, 21 in the wild) of 22 orders, except for 13 species, for which data were collected from a database of veterinary and animal care records of zoo animals (ZIMS). This novel data set itself greatly contributes to the pool of available data on avian glycemia, as previous comparative studies either extracted data from various studies or a ZIMS database (therefore potentially containing much more noise due to different methodologies or other unstandardised factors), or only collected data from a single order, namely Passeriformes. The data further represents the first comparative avian data set on albumin glycation obtained using a unified methodology. The authors used LC-MS to determine glycation levels, which does not have problems with specificity and sensitivity that may occur with assays used in previous studies. The data analysis is thorough, and the conclusions are substantiated. Overall, this is an important study representing a substantial contribution to the emerging field evolutionary physiology focused on ecology and evolution of blood/plasma glucose levels and resistance to glycation.

      Weaknesses

      Unfortunately, the authors did not record handling time (i.e., time elapsed between capture and blood sampling), which may be an important source of noise because handling-stress-induced increase in blood glucose has previously been reported. Moreover, the authors themselves demonstrate that handling stress increases variance in blood glucose levels. Both effects (elevated mean and variance) are evident in Figure ESM1.2. However, this likely makes their significant findings regarding glucose levels and their associations with lifespan or glycation rate more conservative, as highlighted by the authors.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I understand that your main objective regarding glycation rate and lifespan, was to analyse the species resistance to glycation with respect to lifespan, while factoring out the species-specific variation in blood glucose level. However, I still believe that the absolute glycation level (i.e., not controlled for blood glucose level) may also be important for the evolution of lifespan. Given that blood glucose is positively related to both glycation and lifespan (although with a plateau in the latter case), lifespan could possibly be positively correlated with absolute glycation levels. If significant, that would be an interesting and counterintuitive finding, which would call for an explanation, thereby potentially stimulating further research. If not significant, it would show that long-lived species do not have higher glycation levels, despite having higher blood glucose levels, thereby strengthening your argument about higher resistance of longlived species to glycation. So, in my opinion, the inclusion of an additional model of glycation level on life-history traits, without controlling for blood glucose, is worth considering.

      We include now this model as supplementary material, indicating it in several parts of the text, including some of these issues we discussed here.

      Lines 230-231: Please, provide a citation for these GVIF thresholds

      We include it now.

      Figure 3: I think that showing both glucose and glycation rate on the linear scale, rather than log scale, would better illustrate your conclusion - the slowing rise of glycation rate with increasing glucose levels.

      That is a good point, although it may also be confusing for readers to see a graph that represents the data in a different way as the models. Maybe showing both graphs (as 3.A and 3.B) can solve it?

      Figure 4. I recommend stating in the caption that the whiskers do not represent interquartile ranges (a standard option in box plots) but credible intervals as mentioned in the current version of the public author response.

      Sorry about that, it was missed. Now it is included. Nevertheless, interquartile ranges from the posterior distributions can still be observed here represented with the boxes. Then the whiskers are the credible intervals.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthened the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well-written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      This is really a manuscript about CAPE, not caffeic acid, and the title should reflect that. Also, a few details are missing from the description of the experiments. The authors should carefully revise the manuscript to ascertain that all details that could affect the interpretation of their results are presented clearly. Just as an example, the authors state in the results section that TcdB was incubated with compounds and then added to cells. Was there a wash step in between? Could compound carryover affect how the cells reacted independently from TcdB? This is just an example of how the authors should be careful with descriptions of their experimental procedures. Lastly, authors should be careful when drawing conclusions from the analysis of microbiota composition data. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Therefore, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

      Thanks for your constructive suggestion. We have carefully revised the manuscript, including the description of title, results and methods sections.

      Reviewer #2 (Public review):

      Summary:

      This work is towards the development of nonantibiotic treatment for C. difficile. The authors screened a chemical library for activity against the C. difficile toxin TcdB, and found a group of compounds with antitoxin activity. Caffeic acid derivatives were highly represented within this group of antitoxin compounds, and the remaining portion of this work involves defining the mechanism of action of caffeic acid phenethyl ester (CAPE) and testing CAPE in mouse C. difficile infection model. The authors conclude CAPE attenuates C. difficile disease by limiting toxin activity and increasing microbial diversity during C. difficile infection.

      Strengths/ Weaknesses:

      The strategy employed by the authors is sound although not necessarily novel. A compound that can target multiple steps in the pathogenies of C. difficile would be an exciting finding. However, the data presented does not convincingly demonstrate that CAPE attenuates C. difficile disease and the mechanism of action of CAPE is not convincingly defined. The following points highlight the rationale for my evaluation.

      (1) The toxin exposure in tissue culture seems brief (Figure 1). Do longer incubation times between the toxin and cells still show CAPE prevents toxin activity?

      Thanks for your comments. The cytotoxicity assay was employed to directly assess the protective capacity of CAPE against cell death induced by TcdB. Our observations at 1 and 12 h post-TcdB exposure revealed that CAPE effectively mitigated the toxic effects of the TcdB at both time points, demonstrating its potent protective role. Please see Figure S1.

      (2) The conclusion that CAPE has antitoxin activity during infection would be strengthened if the mouse was pretreated with CAPE before toxin injections (Figure 1D).

      Thanks for your constructive comments. According to your suggestion, we administered TcdB 2 h after pretreatment with CAPE. The outcomes demonstrated that CAPE pretreatment significantly enhanced the survival rate of the intoxicated mice, confirming that CAPE retains its antitoxin efficacy during the infection process. Please see Figure S2.

      (3) CAPE does not bind to TcdB with high affinity as shown by SPR (Figure 4). A higher affinity may be necessary to inhibit TcdB during infection. The GTD binds with millimolar affinity and does not show saturable binding. Is the GTD the binding site for CAPE? Auto processing is also affected by CAPE indicating CAPE is binding non-GTD sites on TcdB.

      Thanks for your comments. Our findings indicate that the GTD domain is a critical binding site for CAPE. CAPE exerts its protective effects at multiple stages of TcdB-mediated cell death, including inhibiting TcdB's self-cleavage and blocking the activity of GTD, thereby preventing the glycosylation modification of Rac1 by TcdB.

      (4) In the infection model, CAPE does not statistically significantly attenuate weight loss during C. difficile infection (Figure 6). I recognize that weight loss is an indirect measure of C. difficile disease but histopathology also does not show substantial disease alleviation (see below).

      Thanks for your comments. Our comparative analysis revealed a notable distinction in the body weight of mice on the third day post-infection (Figure 6B). Similarly, the dry/wet stool ratio exhibited a comparable pattern, suggesting that treatment with phenethyl caffeic acid ameliorated Clostridium difficile-induced diarrhea to a significant degree (Figure 6C).

      (5) In the infection model (Figure 6), the histopathology analysis shows substantial improvement in edema but limited improvement in cellular infiltration and epithelial damage. Histopathology is probably the most critical parameter in this model and a compound with disease-modifying effects should provide substantial improvements.

      Thanks for your comments. Edema, inflammatory factor infiltration, and epithelial damage served as key evaluation metrics. Statistical analysis revealed that the pathological scores of mice treated with CAPE were markedly reduced compared to those in the model group (Figure 6F).

      (6) The reduction in C. difficile colonization is interesting. It is unclear if this is due to antitoxin activity and/or due to CAPE modifying the gut microbiota and metabolites (Figure 6). To interpret these data, a control is needed that has CAPE treatment without C. difficile infection or infection with an atoxicogenic strain.

      The observed reduction in C. difficile fecal colonization following drug treatment may be attributed to the CAPE's antitoxin properties or its capacity to modify the intestinal microbiota and metabolites. These two mechanisms likely work in tandem to combat CDI. CDI is primarily triggered by the toxins A (TcdA) and B (TcdB) secreted by the bacterium. Certain therapies, including monoclonal antibodies like bezlotoxumab, target CDI by neutralizing these toxins, thereby mitigating gut damage and subsequent C. difficile colonization(1,2). The establishment of C. difficile in the gut is intricately linked to the equilibrium of the intestinal microbiota. Although antibiotic treatments can inhibit C. difficile growth, they may also disrupt the microbial balance, potentially facilitating the overgrowth of other pathogens. Consequently, interventions such as fecal microbiota transplantation (FMT) are designed to reestablish gut flora balance and consequently decrease C. difficile colonization(3,4). Moreover, the administration of probiotics and prebiotics is considered to reduce C. difficile colonization by modifying the gut environment(5,6).

      (7) Similar to the CAPE data, the melatonin data does not display potent antitoxin activity and the mouse model experiment shows marginal improvement in the histopathological analysis (Figure 9). Using 100 µg/ml of melatonin (~ 400 micromolar) to inactivate TcdB in cell culture seems high. Can that level be achieved in the gut?

      The uptake and dissemination of melatonin within the body varies with the dose administered. For instance, in rats, the bioavailability of melatonin following administration was found to be 53.5%, whereas in dogs, bioavailability was nearly complete (100%) at a dose of 10 mg/kg, yet it decreased to 16.9% at a lower dose of 1 mg/kg(7). This data suggests that the absorption of melatonin differs across various animal species and is influenced by the dose administered. Moreover, it underscores the higher potential bioavailability of melatonin, implying that a dose of 200 mg/kg should be adequate to achieve the desired concentration in the body post-administration.

      (8) The following parameters should be considered and would aid in the interpretation of this work. Does CAPE directly affect the growth of C. difficile? Does CAPE affect the secretion of TcdB from C. difficile? Does CAPE alter the sporulation and germination of C. diffcile?

      We incorporated CAPE into the MIC assay for detecting C. difficile, as well as for assessing the sporulation capacity of C. difficile and evaluating the secretion level of TcdB. The findings revealed that CAPE markedly repressed tcdB transcription at a concentration of 16 μg/mL and effectively suppressed the growth and sporulation of C. difficile BAA-1870 at a concentration of 32 μg/mL. Please see Figure S3.

      References:

      (1) Skinner AM, et al. Efficacy of bezlotoxumab to prevent recurrent Clostridioides difficile infection (CDI) in patients with multiple prior recurrent CDI. Anaerobe. 2023 Dec; 84: 102788.

      (2) Wilcox MH, et al. Bezlotoxumab for Prevention of Recurrent Clostridium difficile Infection. N Engl J Med. 2017 Jan 26;376(4):305-317.

      (3) Khoruts A, Sadowsky MJ. Understanding the mechanisms of faecal microbiota transplantation. Nat Rev Gastroenterol Hepatol. 2016 Sep;13(9):508-16.

      (4) Khoruts A, Staley C, Sadowsky MJ. Faecal microbiota transplantation for Clostridioides difficile: mechanisms and pharmacology. Nat Rev Gastroenterol Hepatol. 2021 Jan;18(1):67-80.

      (5) Mills JP, Rao K, Young VB. Probiotics for prevention of Clostridium difficile infection. Curr Opin Gastroenterol. 2018 Jan;34(1):3-10.

      (6) Lau CS, Chamberlain RS. Probiotics are effective at preventing Clostridium difficile-associated diarrhea: a systematic review and meta-analysis. Int J Gen Med. 2016 Feb 22; 9:27-37.

      (7) Yeleswaram K, et al. Pharmacokinetics and oral bioavailability of exogenous melatonin in preclinical animal models and clinical implications. J Pineal Res. 1997 Jan;22(1):45-51.

      Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI.

      Strengths:

      The results are really good, and the CAPE shows a good and promising alternative for treating CDI. The methodology and results are well presented, with tables and figures that corroborate them. It is solid work and very promising.

      Weaknesses:

      Some references are too old or missing.

      Thanks for your constructive suggestion. We have included and refreshed several references to enhance the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the manuscript convincingly demonstrates that CAPE affects the TcdB toxin and reduces its toxicity in vitro, it would be beneficial to include data on the effect of CAPE on the growth of C. difficile. This would help ensure that the observed in vivo effects are not merely due to reduced bacterial growth but rather due to the specific action of CAPE on the toxin.

      Thanks for your constructive suggestion. We have augmented our findings with the impact of CAPE on the bacteria themselves, revealing that CAPE not only hampers the growth of the bacterial cells but also suppresses their capacity to produce spores. Please see Figure S3.

      (1) Line 41, line 115 - authors should clarify what they mean when mentioning Bacteroides within parentheses.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 71 - Is C. difficile really found "in the environment"?

      Thanks for your comments. C. difficile is prevalent across various natural settings, including soil and water ecosystems. A study has identified highly diverse strains of this bacterium within environmental samples(1). Moreover, the significant presence of C. difficile in soil and lawn specimens collected near Australian hospitals indicates that the organism is indeed a common inhabitant in the environment(2).

      (3) Lines 128-130 - Was there a wash step here? What could be the impact of compound carryover in this experiment?

      Thanks for your comments. Following pre-incubation of TcdB with CAPE, remove the compounds that have not bound to TcdB through centrifugation. The persistence of the compound in the culture post-washing could result in an inflated assessment of its efficacy, particularly if it continues to engage with TcdB or the cells beyond the initial 1-hour pre-incubation window. The carryover of the compound might also give rise to misleading positive results, where the compound seems to confer protection or inhibition against TcdB-mediated cell rounding, whereas such effects are actually due to the lingering activity of the compound. This carryover could skew the determination of the compound's minimum effective concentration, as the effective concentration interacting with the cells might be inadvertently elevated. Furthermore, if the compounds possess cytotoxic properties or impact cell viability, carryover could generate artifacts in cell morphology that are unrelated to the direct interaction between TcdB and the compounds.

      (4) Lines 133-134 - I suggest authors mention how many caffeic acid derivatives there were in the entire library so that the suggested "enrichment" of them in the group of bioactive compounds can be better judged.

      Thanks for your comments. The natural compound library contained eight caffeic acid derivatives, of which methyl caffeic acid and ferulic acid displayed no efficacy. This information has been incorporated into the manuscript.

      (5) Line 135 - I recommend the authors add the molarity of the compound solutions used.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (6) Line 247 - I think the term "CAPE mice" is confusing. Please use a full description.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (7) Line 248 - I also think the terms "model mice" and "model group" are confusing. Maybe call them "control mice"?

      Thanks for your comments. The terms "model mice" and "model group" are indeed synonymous, and we have subsequently clarified that control mice refer to those that have not been infected with C. difficile.

      (8) Line 273 - "most abundant species at the genus level" is incorrect. I think what you mean is "most abundant TAXA".

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (9) Line 278 - Please include your p-value cut-off together with the LDA score.

      Thanks for your comments. We have revised the above description to “LDA score > 3.5, p < 0.05”.

      (10) Line 292 - Details on how metabolomics was performed should be included here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (11) Line 299 - 1.5 is a fairly low cut-off. The authors should at a minimum also include the p-value cut-off used.

      Response: Thanks for your comments. We have revised the above description to “fold change > 1.5, p < 0.05”.

      (12) Line 307 - Purine "degradation" would be better here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (13) Line 328 onward - The melatonin experiment is a weird one. Although I fully understand the rationale behind testing the effect of melatonin in the mouse model, the idea that just because melatonin levels changed in the gut it would act as a direct inhibitor of TcdB was very far-fetched, even though it ended up working. Authors should explain this in the manuscript.

      Thanks for your comments. Furthermore, beyond our murine studies, we have confirmed that melatonin significantly diminishes TcdB-induced cytotoxicity at the cellular level (Figure 9A). Additionally, it has been documented that melatonin, acting as an antimicrobial adjuvant and anti-inflammatory agent, can decrease the recurrence of CDI(3). Consequently, we contend that the aforementioned statement is substantiated.

      (14) Lines 429-435 - There are seemingly contradictory pieces of information here. The authors state that adenosine is released from cells upon inflammation and that CAPE treatment caused an increase in adenosine levels. Later in this section, the authors state that adenosine prevents TcdA-mediated damage and inflammation. This should be clarified and better discussed.

      Thanks for your comments. Adenosine modulates immune responses and inflammatory cascades by interacting with its receptors, including its capacity to suppress the secretion of specific pro-inflammatory mediators. We have updated this depiction in the manuscript.

      (15) Lines 513-514 - How was this phenotype quantified?

      Thanks for your comments. Initially, we introduced TcdB at a final concentration of 0.2 ng/mL along with various concentrations of compounds into 1 mL of medium for a 1-h pre-incubation period. Subsequently, unbound compounds were removed through centrifugation, and the resulting mixture was then applied to the cells.

      (16) Figure 3 - panels are labeled incorrectly.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (17) Figure 5C - it is unclear what the different colors and labels represent.

      Thanks for your comments. In the depicted graph, blue denotes the total binding energy, red signifies the electrostatic interactions, green corresponds to the van der Waals forces, and orange indicates solvation or hydration effects. The horizontal axis represents the mutation of the amino acid residue at the respective position to alanine. As illustrated in Figure 5C, the mutations W520A and GTD exhibit the highest binding energies.

      References:

      (1) Janezic S, et al. Highly Divergent Clostridium difficile Strains Isolated from the Environment. PLoS One. 2016 Nov 23;11(11): e0167101.

      (2) Perumalsamy S, Putsathit P, Riley TV. High prevalence of Clostridium difficile in soil, mulch and lawn samples from the grounds of Western Australian hospitals. Anaerobe. 2019 Dec; 60:102065.

      (3) Sutton SS, et al. Melatonin as an Antimicrobial Adjuvant and Anti-Inflammatory for the Management of Recurrent Clostridioides difficile Infection. Antibiotics (Basel). 2022 Oct 25;11(11):1472.

      Reviewer #2 (Recommendations for the authors):

      Minor comments and questions.

      (1) Which form of TcdB is being used in these experiments?

      Thanks for your comments. The TcdB proteins used in this study are TcdB1 subtypes.

      (2) Why are THP-1 cells being used in these assays?

      Thanks for your comments. For the purposes of this study, we employed a diverse array of cell lines, including Vero, HeLa, THP-1, Caco-2, and HEK293T. Each cell line was selected to serve a specific experimental objective. The inclusion of the THP-1 cell line was necessitated by the need to incorporate a macrophage cell line to ensure the comprehensive nature of our experiments, allowing for the testing of both epithelial cells and macrophages. C. difficile is a kind of intestinal pathogenic bacteria, and immune clearance plays a vital role in the process of pathogen infection, so THP-1 cells are used as important immune cells.

      (3) Please improve the quality of the microscopy images in Figure 1.

      Thanks for your comments. We have improved the quality of the microscopy images in Figure 1.

      (4) Does the flow cytometry experiment in Figure 2B show internalization? Surface-bound toxins would provide the same histogram.

      Thanks for your comments. Figure 2B was employed to assess the internalization of TcdB, and the findings indicate that CAPE does not influence the internalization process of TcdB.

      (5) The sensogram in Figure 4A does not look typical and should be clarified.

      Thanks for your comments. Typically, small molecules and proteins engage in a rapid binding and dissociation dynamic. However, as depicted in Figure 4A, the interaction between CAPE and TcdB demonstrates a gradual progression towards equilibrium. This behavior can be primarily explained by the swift occupation of the protein's primary binding sites by the small molecule in the initial stages. Subsequently, CAPE binds to secondary or lower affinity sites, extending the time needed to reach equilibrium. Additionally, the likelihood of CAPE binding to multiple sites on TcdB requires time for the exploration and occupation of these diverse locations before equilibrium is attained, we have incorporated an analysis of this potential scenario into the manuscript.

      Reviewer #3 (Recommendations for the authors):

      These are my suggestions for the text:

      (1) Line 29: high recurrent rates.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 32: Where is the caffeic acid identified? I think a line should be included.

      Thanks for your comments. Caffeic acid was identified from natural compounds library and we have completed the corresponding modifications according to the suggestions.

      (3) Line 39: C. difficile is not italic.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (4) Line 41: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (5) Line 56: This number of casualties 56.000 is still happening or it was in the past?

      Thanks for your comments. The mortality rates reported in the manuscript reflect a downturn in the incidence and fatality of CDI around 2017(1), as the infection gained broader recognition. Nonetheless, a recent study reveals that the mortality rate for CDI cases in Germany can soar to 45.7% within a year, with the overall economic burden amounting to approximately 1.6 billion euros. This underscores the ongoing significance of CDI as a global public health challenge(2).

      (6) Line 104: Where did the idea of testing caffeic acid come from? Any previous study of the authors? Any studies with the inhibition of other pathogens?

      Thanks for your comments. Initially, we conducted a screen of a compound library comprising 2,076 compounds and identified several potent inhibitors, which, upon structural analysis, were revealed to be caffeic acid derivatives. Prior to our investigation, no studies had explored the potential of CAPE in this context.

      (7) Line 115: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      Results section

      (8) Did the authors try the caffeic acid with the TcdA or binary toxin? I know this is not the purpose of the study, but TcdA toxin has a high identity structure with TcdB and generates inflammation in the gut via neutrophils. Negative strains for the major toxins and positive for the binary toxin also cause severe cases of CDI.

      Thanks for your comments. Although we acknowledge the significance of TcdA and binary toxins in CDI, we did not investigate the impact of CAPE on these toxins. Our focus was exclusively on the effect of CAPE against TcdB, as it is the primary virulence factor in C. difficile pathogenesis. Since TcdA and TcdB are highly similar in structure, we will analyze the neutralization effect of CAPE on TcdA in later studies.

      (9) Does caffeic acid have any effect on C. difficle? Or does it only gain the toxins? That would be ideal.

      Thanks for your comments. We have included additional related assays in our study. Beyond directly neutralizing TcdB, CAPE also demonstrates the capacity to inhibit the growth and spore formation of C. difficile.

      (10) Line 230: C. difficile BAA-1870 is a clinical strain? There are no details about it in the paper.

      Thanks for your comments. C. difficile BAA-1870 (RT027/ST1), a highly virulent isolate frequently employed in research(3-6), was kindly donated by Professor Aiwu Wu. We have meticulously noted the PCR ribotype in our manuscript.

      (11) Line 236: Did the mice fully recover from CDI after the administration of the CAPE? Was one dose enough?

      Thanks for your comments. CAPE was administered orally at 24 h intervals, commencing with the initial dose on Day 0. By the time a significant difference was observed on Day 3, the treatment had been administered a total of three times.

      Methodology

      (12) Most of the methods do not have a reference.

      Thanks for your comments. We have added several references to the methods.

      Discussion section

      (13) The first two paragraphs of the discussion should be summarized. Those details were already explained in the introduction.

      Thanks for your comments. The discussion section and the introduction address slightly different focal points; therefore, we aim to retain the first two paragraphs to maintain continuity and context.

      (14) Line 382: Bezolotoxumab was approved by the FDA in 2016. It is not recent.

      Thanks for your comments. We have revised the above description.

      (15) Line 410: "Despite the high 410 cure rate and increasing popularity of FMT, its safety remains controversial. Although this is true, recently (2022) the FDA approved the Rebyota, which was later cited by the authors.

      Thanks for your comments. We have revised the above description.

      (16) Lines 415-416: "the abundance of Bacteroides, a critical gut microbiota component that is required for C. difficile resistance". There is only one reference cited by the authors. I suppose that if it is true, more studies should be mentioned. Why are probiotics with Bacteroides spp. not available in the market?

      Thanks for your comments. We have supplemented additional references. The scarcity of probiotic products containing Bacteroides spp. on the market is primarily attributable to the stringent requirements of their survival conditions. As most Bacteroides spp. are anaerobic, they thrive in oxygen-deprived environments. This unique survival trait poses challenges in maintaining their viability during product preservation and distribution, which in turn escalates production costs and complexity. Furthermore, despite the significant role of Bacteroides in gut health, research into its potential probiotic benefits and safety is comparatively underexplored.

      References:

      (1) Guh AY, et al. Emerging Infections Program Clostridioides difficile Infection Working Group. Trends in U.S. Burden of Clostridioides difficile Infection and Outcomes. N Engl J Med. 2020 Apr 2;382(14):1320-1330.

      (2) Schley K, et al. Costs and Outcomes of Clostridioides difficile Infections in Germany: A Retrospective Health Claims Data Analysis. Infect Dis Ther. 2024 Nov 20.

      (3) Saito R, et al. Hypervirulent clade 2, ribotype 019/sequence type 67 Clostridioides difficile strain from Japan. Gut Pathog. 2019 Nov 4; 11:54.

      (4) Pellissery AJ, Vinayamohan PG, Venkitanarayanan K. In vitro antivirulence activity of baicalin against Clostridioides difficile. J Med Microbiol. 2020 Apr;69(4):631-639.

      (5) Shao X, et al. Chemical Space Exploration around Thieno[3,2-d]pyrimidin-4(3H)-one Scaffold Led to a Novel Class of Highly Active Clostridium difficile Inhibitors. J Med Chem. 2019 Nov 14;62(21):9772-9791.

      (6) Mooyottu S, Flock G, Venkitanarayanan K. Carvacrol reduces Clostridium difficile sporulation and spore outgrowth in vitro. J Med Microbiol. 2017 Aug;66(8):1229-1234.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      Chabukswar et al analysed endogenous retrovirus (ERV) Env variation in a set of primate genomes using consensus Env sequences from ERVs known to be present in hominoids using a Blast homology search with the aim of characterising env gene changes over time. The retrieved sequences were analysed phylogenetically, and showed that some of the integrations are LTR-env recombinants.

      Strengths

      The strength of the manuscript is that such an analysis has not been performed yet for the subset of ERV Env genes selected and most of the publicly available primate genomes.

      Weaknesses

      Unfortunately, the weaknesses of the manuscript outnumber its strengths. Especially the methods section does not contain sufficient information to appreciate or interpret the results. The results section contains methodological information that should be moved, while the presentation of the data is often substandard. For instance, the long lists of genomes in which a certain Env was found could better be shown in tables. Furthermore, there is no overview of the primate genomes Saili how did you answer to this?, or accession numbers, used. It is unclear whether the analyses, such as the phylogenetic trees, are based on nucleotide or amino acid sequences since this is not stated. tBLASTn was used in the homology searches, so one would suppose aa are retrieved. In the Discussion, both env (nt?) and Env (aa?) are used.

      For the non-hominoids, genome assembly of publicly available sequences is not always optimal, and this may require Blasting a second genome from a species. Which should for instance be done for the HML2 sequences found in the Saimiri boliviensis genome, but not in the related Callithrix jacchus genome. Finally, the authors propose to analyse recombination in Env sequences but only retrieve env-LTR recombinant Envs, which should likely not have passed the quality check.

      Since the Methods section does not contain sufficient information to understand or reproduce the results, while the Results are described in a messy way, it is unclear whether or not the aims have been achieved. I believe not, as characterisation of env gene changes over time is only shown for a few aberrant integrations containing part of the LTR in the env ORF.

      We thank the reviewer for the critiques of the manuscript and their constructive suggestions to improve the clarity, methodological rigor, and data presentation.

      (1) The concern regarding the insufficient data in the methods has been resolved in the revised manuscript by adding a supplementary file that contains the genome assemblies that  were used to perform the tBLAStn analysis using the reconstructed Env sequences. The requested accession numbers are available for all sequences in the supplementary phylogenetic figures.

      (2) We have also modified the manuscript by moving a portion of the results section in the methods section, in particular all the methodological description of the reconstruction of Env part (Line 197-231).

      (3) As suggested, the long list of genomes mentioned in the results section in which the Env tBLASTn hits were obtained are now provided in the table form (Table 2) as an overall summary of the distribution of ERV Env in the genomes and the genome assemblies are mentioned in Supplementary file 2.

      (4) As for the point regarding the tBLASTn usage in the homology searches, we first performed tBLASTn analysis using the reconstructed Env amino acid sequences as query and performed tBLASTn similarity search in the primate genomes. The tBLASTn algorithm uses the amino acid sequences to compare with the translated nucleotide database in all six frames and hence the hits obtained are nucleotide sequences (Line 381-383). These nt sequences were used for all the further analysis such as sequence alignment, phylogenetic analysis and recombination analysis. For better clarity, we have specified the use of env nt alignments in the methods section to avoid the raised confusion in the discussion.

      (5) For the HML supergroup characterization in squirrel monkey genome (Saimiri boliviensis), we used the tBLASTn hits obtained in the S. boliviensis from the initial analysis to perform the comparative genomics in two Platyrrhini genomes available on UCSC Genome browser. In particular, this analysis was performed to confirm the presence of specific members of HML supergroup in squirrel monkey genomes that has not been previously reported. We used the available genome assemblies because of the annotations available on Genome browser, and especially the possibility to use the repeatmasker tracks and the comparative genomics tools in order to use the human genome as a reference. We reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.

      (6) The concern regarding only retrieving env-LTR recombinant Envs has been addressed in the revised results section (Lines 747-758). As also mentioned in the methods section, the RDP software detects the recombinant sequences and a breakpoint position for the recombinant signals and hence we confirmed only those sequences that were predicted as potential recombinant sequences by the RDP software through comparative genomics. All the sequences predicted by the software were env-LTR recombinant and hence we confirmed and reported only those recombinant sequences in the manuscript.

      Reviewer #1 (Recommendations for the authors):

      The paper could be strengthened by:

      - a rigorous rewriting and shortening of the manuscript, thereby eliminating all textbook-like paragraphs, and all biological misinterpretations and confusions. Distinguish between retroviral replication as an exogenous virus, and host genome remodeling affecting ERVs. Rewrite the sections on template switching by RT being the basis for the observed recombinations, while host genome recombinations are far more likely. ERVs with such aberrant env/LTR gene recombination are unlikely to be fit for cross-species transmission. Likely, such a recombinant was generated in a common ancestor. Also, host RNA polymerase II transcribes retroviral RNA (line 79), not RT.

      - check lines 89-90 as pro is part of the pol gene in gamma- and lentiviruses.

      We thank the reviewer for the suggestion, we have revised the manuscript by shortening the introduction section and eliminating the textbook like paragraphs and also clarifying the recombination mechanism. We have revised the introduction section at Lines 102-111, and the clarification for the recombination mechanism is provided at lines 1668-1675

      - adding much more information to the Methods section. Such as which genomes were searched, were nt or aa have been retrieved and analysed, were multiple genomes of a species searched, a list of databases used ('various databases' in line 164 does not suffice), etc.

      We thank the reviewer for the observation. As mentioned above, in the revised manuscript we have provided more detailed methods by including a supplementary file for the genome assemblies used for tBLASTn analysis and comparative genomics. For the sequence alignment, phylogenetic analysis and recombination analysis we used nt sequences, as it is also mentioned in the revised version. Lastly, all the databases that were used and are mentioned in the methods section.

      - more information is needed on the alignments and phylogenetic trees. For instance, how were indels treated? How long were the alignments on average regarding informative sites?

      We thank the reviewer for the questions, to answer them we have added a paragraph (Lines 359-362) describing the reconstruction process in more details.

      - confirm the findings about the presence or absence of an ERV, such as for the squirrel monkey genome, using additional genomes of the species

      As mentioned above, we only used the genome assemblies available on the genome browser because of the annotations available on Genome browser, blasting the second NCBI RefSeq genome using the BLAST algorithm does not provide accurate information and annotations compared to that of Genome browser and hence we reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.

      - present the lists of findings in primate genomes on pages 9 and 10 in tables

      We thank the reviewer for the suggestion, we have provided a new table (Table 2) in the revised version summarizing the ERV Env distribution results.

      - a significant limitation of the study is that only env ERVs found in hominoids have been searched in OWM and NWM, not ones specific for monkeys. This should be mentioned somewhere.

      As the reviewer pointed out, the study was designed to explore ERVs’ Env  sequences in hominoids which were then searched in the OWM and NWM genomes, this is now better stated in the introduction at Lines 57-60.

      - define abbreviations at first use (e.g. HML in abstract)

      We thank the reviewer for the suggestion, we have mentioned the abbreviations in the abstract, where we mentioned HML first (Line 65)

      - explain 'pathological domestication' (line 42). Domestication implies usefulness to the host. And over time, deleterious insertions would have been likely purged from a population.

      We thank the reviewer for the observation, we have modified the sentence and provided a clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).

      Furthermore:

      - why begin the discussion with a lengthy description of domestication and syncytins, which is not part of the current study?

      We thank the reviewer for the critique. Accordingly, we have now modified the discussion section by shortening the part about domestication of syncytins, and just mentioned them as an example at lines 942-944.

      - how can 96 hits have been retrieved for spuma-like envs (line 506), while it was earlier reported (line 333), that the most hits were gamma-like?

      We thank the reviewer for the observation, we have clarified and explained how 96 hits have been retrieved for spuma-like envs in lines 670-677 of the discussion section.

      English grammar should be improved throughout the manuscript.

      And I could not open half of the supplementary files

      As suggested we have revised English and checked that all files were correctly open.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Chabukswar et al. describes a comprehensive attempt to identify and describe the diversity of retroviral envelope (env) gene sequences present in primate genomes in the form of ancient endogenous retrovirus (ERV) sequences.

      Strengths:

      The focus on env can be justified because of the role the Env proteins likely played in determining viral tropism and host range of the viruses that gave rise to the ERV insertions, and to a lesser extent, because of the potential for env ORFs to be coopted for cellular functions (in the rare cases where the ORF is still intact and capable of encoding a functional Env protein). In particular, these analyses can reveal the potential roles of recombination in giving rise to novel combinations of env sequences. The authors began by compiling env sequences from the human genome (from human endogenous retrovirus loci, or "HERVs") to build consensus Env protein sequences, and then they use these as queries to screen other primate genomes for group-specific envs by tBLASTn. The "groups" referred to here are previously described, as unofficial classifications of endogenous retrovirus sequences into three very broad categories - Class I, Class II and Class III. These are not yet formally recognized in retroviral taxonomy, but they each comprise representatives of multiple genera, and so would fall somewhere between the Family and Genus levels. The retrieved sequences are subject to various analyses, most notably they are screened for evidence of recombination. The recombinant forms appear to include cases that were probably viral dead-ends (i.e. inactivating the env gene) even if they were propagated in the germline.

      The availability of the consensus sequences (supplement) is also potentially useful to others working in this area.

      Weaknesses:

      The weaknesses are largely in presentation. Discussions of ERVs are always complicated by the lack of a formal and consistent nomenclature and the confusion between ERVs as loci and ERVs as indirect information about the viruses that produced them. For this reason, additional attention needs to be paid to precise wording in the text and/or the use of illustrative figures.

      We thank the reviewer for the general observation. We put additional attention to the wording in text/figures, and hope to have improved the manuscript clarity.

      Reviewer #2 (Recommendations for the authors):

      Reviewing the manuscript was a challenge because figures were difficult to read. As provided, the fonts were sometimes too small to read in a standard layout and had to be expanded on screen.

      The tree in Figure 3 could also be made easier to read, for example if the authors collapsed related branches and gave the clusters a single, clear label (this is not necessary, just a suggestion) - especially if the supplementary trees have all the labelled branches for any readers who want specific details.

      I also recommend asking a third party (perhaps a scientific colleague) with fluency in English grammar and familiarity with English scientific idiom to provide some editorial feedback on the text.

      Figure 4 legend is confusing. From the description it sounds like the tree in 4B is a host phylogeny, but it's not clearly stated. And if so, how was the tree generated? Is it based on entire genomes? Include at least enough methodological detail or citations that someone could recreate it, if necessary. The details and how it was done should be briefly mentioned here and in detail in the Methods section.

      We thank the reviewer for the observation. As for Figure 4 we have modified its legend and more clearly stated how the phylogenetic tree of the primate genomes was generated using TimeTree. We have also provided further details in the methods section (Lines 475-489).

      As suggested we have revised English.

      Line 42 - what is "pathological domestication"? It sounds like a contradiction in terms.

      We thank the reviewer for the observation. We have modifies the sentence and provided clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).

      Lines 166-167 - the authors use the word "classes" but then use a list of terms that correspond to genera within the Retroviridae. The authors should be cautious here, as "class" and "genus" are both official taxonomic terms with different meanings. Do they mean genus? Or, if a more informal term is needed, perhaps "group"?

      Thank you for the observation, the ERVs have been classified into three classes (Class I, II and III) based on the relatedness to the exogenous retroviruses Gammaretrovirus, Betaretrovirus and Spumaretrovirus genera respectively and hence have been mentioned in the manuscript as per the nomenclature proposed by Gifford et al., 2018 which has been cited at Lines 122-125.

      Line 221- "defferent" should be "different"

      Corrected

      Lines 233-234 - what is meant by "canonical" and "non-canonical" forms? Can the authors please define these two terms?

      Thank you for the question, canonical refers to sequences that are well-preserved and match the structural and functional features of complete env genes, and non-canonical refers to sequences with significant structural alterations or truncations that deviate from this typical form. This explanation has been mentioned in the revised version at Lines 475-479.

      Line 252 - if/is

      Corrected

      Lines 274-276 needs a citation to the paper(s) that reported this.

      Corrected

      Line 283-285 - this was confusing. How could the authors have noted distinct occurrences and clusters of these if they were excluded from the BLAST analysis? It says the consensus sequences were effectively representing these, but doesn't this raise the possibility that the consensus sequences are not specific enough? Could this also then lead to false identification? Perhaps a few more words to explain should be added.

      We thank the reviewer for the observation. While performing the tBlastn search we did obtain the hits for HERV15, HERVR, ERVV1, ERVV2 and PABL, and we have mentioned the detailed explanation about this observation in the revised manuscript at lines 619-627.

      Line 298 - missing comma

      Corrected

      Lines 348-351- this list is not a list of recombination mechanisms. Template switching is a mechanism of recombination, but "acquisition" is simply a generic term, "degradation" is not a mechanism, and "cross-species transmission" might be a driver or a result of recombination, but it is not a mechanism of recombination.

      We thank the reviewer for the observation. We have revised the explanation for the recombination events in the discussion section, as some parts of the results have been moved to discussion section (Lines 1058-1065)

      Lines 369-372. It's not clear why this means the event was a "very recent occurrence". Do the authors mean that there were shared integration sites between some of the species, and that these sites lacked the insertions in other species (e.g. gibbon, orangutan, monkeys)?

      For the long section on recombination events involving an env sequence with an LTR in it, can the authors explain how they know when it's a recombination event versus integration of one provirus into another one, followed by recombination between LTRs to generate a solo-LTR?

      We thank the reviewer for the observation. Regarding the very recent occurrence of the recombination event, we have explained it in revised manuscript at lines 769-824 writing “In fact, the recombinant sequences were shared only between 4 species of Catarrhini parvorder and were absent in more distantly related primates (such as gibbons, orangutans, etc.). This with the presence of shared recombination sites suggests that the insertion occurred after the divergence of these species, while its absence in others indicate that it is a recombination event.”

      For the observation regarding the env-LTR recombination events, the recombinants were first detected by the RDP software and were further validated through the BLAT search in the genomes available on genome browser. The explanation on how we obtained these env-LTR recombination events is now provided in lines 746-763 of the revised manuscript.

      Methods Lines 151-168 and Figure 1 legend Lines 689-690 - how did the authors distinguish between "translated regions" corresponding to the actual Env protein sequence from translation of the other two reading frames? That is, there must have been substantial "translatable" stretches of sequence in the two incorrect reading frames as well as the reading frame corresponding to Env, so the question is how were the correct ones identified for the reconstruction?

      We thank the reviewer for the observation. We have provided the detailed explanation to the observation in the methods section (Lines 335-359).

      Line 495 - "previously reported" should include citation(s) of the prior report(s).

      We thank the reviewer for the observation, we have provided appropriate citations.

      Line 525 - the authors propose that the mechanism "is the co-packaging of different ERVs in a virus particle". First, I assume they meant to say that RNA from different ERVs is co-packaged. Second, isn't it also possible or likely that these could arise from co-packaging of exogenous retrovirus RNAs and recombination, especially if the related exogenous forms were still circulating at the time these things arose?

      We thank the reviewer for the observation. We have modified in the revised manuscript a proposed mechanism that includes also the possibility of co-packaging of exogenous retrovirus RNAs and recombination, at lines 1082-1099

      Line 686 - env should either be italicized (gene) or capitalized (protein), depending on what the authors intended here.

      We thank the reviewer for the observation. We have corrected the typological error in the new version of manuscript.

      Reviewer #3 (Public review):

      Summary:

      Retroviruses have been endogenized into the genome of all vertebrate animals. The envelope protein of the virus is not well conserved and acquires many mutations hence can be used to monitor viral evolution. Since they are incorporated into the host genome, they also reflect the evolution of the hosts. In this manuscript the authors have focused their analyses on the env genes of endogenous retroviruses in primates. Important observations made include the extensive recombination events between these retroviruses that were previously unknown and the discovery of HML species in genomes prior to the splitting of old and new world monkeys.

      Strengths:

      They explored a number of databases and made phylogenetic trees to look at the distribution of retroviral species in primates. The authors provide a strong rationale for their study design, they provide a clear description of the techniques and the bioinformatics tools used.

      Weaknesses:

      The manuscript is based on bioinformatics analyses only. The reference genomes do not reflect the polymorphisms in humans or other primate species. The analyses thus likely underestimates the amount of diversity in the retroviruses. Further experimental verification will be needed to confirm the observations.

      Not sure which databases were used, but if not already analyzed, ERVmap.com and repeatmesker are ones that have many ERVs that are not present in the reference genomes. Also, long range sequencing of the human genome has recently become available which may also be worth studying for this purpose.

      We thank the reviewer for the observations and comments. We would like to clarify that the intent of the work was to perform bioinformatics analysis and so a wet lab experimental verification of the observations are out of the scope of the present manuscript. For the aim of the manuscript, we have used the NCBI reference genomes, while for the report of the coordinates of HML supergroup in the squirrel monkey genome and the coordinates of the recombination events through BLAT search we have used genomes assemblies available on Genome browser with repeat masker custom track, since it has well represented ERV annotations.

      The suggestion regarding using long range sequencing of human genome is an interesting perspective and hence in the future work we will try to implement it in our analysis as well as perform an experimental verification, since, again, the focus of the present work does not include wet experimental part.

      Reviewer #3 (Recommendations for the authors):

      In a few places the term HERV has been used when describing ERVs in non-human primates. This needs to be corrected.

      We thank the reviewer for the observation. We have checked and accordingly modified the terms in the manuscript wherever necessary.

    1. Author response:

      eLife Assessment

      This study provides a valuable contribution to understanding how negative affect influences food-choice decision making in bulimia nervosa, using a mechanistic approach with a drift diffusion model (DDM) to examine the weighting of tastiness and healthiness attributes. The solid evidence is supported by a robust crossover design and rigorous statistical methods, although concerns about low trial counts, possible overfitting, and the absence of temporally aligned binge-eating measures limit the strength of causal claims. Addressing modeling transparency, sample size limitations, and the specificity of mood induction effects, would enhance the study's impact and generalizability to broader populations.

      We thank the Editor and Reviewers for their summary of the strengths of our study, and for their thoughtful review and feedback on our manuscript. We apologize for the confusion in how we described the multiple steps performed and hierarchical methods used to ensure that the model we report in the main text was the best fit to the data while not overfitting. We are not certain about what is meant by “[a]ddressing model transparency,” but as described in our response to Reviewer 1 below, we have now more clearly explained (with references) that the use of hierarchical estimation procedures allows for information sharing across participants, which improves the reliability and stability of parameter estimates—even when the number of trials per individual is small. We have clarified for the less familiar reader how our Bayesian model selection criterion penalizes models with more parameters (more complex models). Although details about model diagnostics, recoverability, and posterior predictive checks are all provided in the Supplementary Materials, we have clarified for the less familiar reader how each of these steps ensures that the parameters we estimate are not only identifiable and interpretable, but also ensure that the model can reproduce key patterns in the data, supporting the validity of the model. Additionally, we have provided all scripts for estimating the models by linking to our public Github repository. Furthermore, we have edited language throughout to eliminate any implication of causal claims and acknowledged the limitation of the small sample size.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the drift diffusion model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options between individuals with bulimia nervosa (BN) and healthy participants.

      (2) The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has the potential to improve the understanding of pathological food choices. The article is based on secondary research data.

      Weaknesses:

      I have two major concerns and a major improvement point.

      The major concerns deal with the reliability of the results of the DDM (first two sections of the Results, pages 6 and 7), which are central to the manuscript, and the consistency of the results with regards to the identification of mechanisms related to binge eating in BN patients (i.e. last section of the results, page 7).

      (1) Ratcliff and McKoon in 2008 used tasks involving around 1000 trials per participant. The Chen et al. experiment the authors refer to involves around 400 trials per participant. On the other hand, Shevlin and colleagues ask each participant to make two sets of 42 choices with two times fewer participants than in the Chen et al. experiment. Shevlin and colleagues also fit a DDM with additional parameters (e.g. a drift rate that varies according to subjective rating of the options) as compared to the initial version of Ratcliff and McKoon. With regards to the number of parameters estimated in the DDM within each group of participants and each emotional condition, the 5- to 10-fold ratio in the number of trials between the Shevlin and colleagues' experiment and the experiments they refer to (Ratcliff and McKoon, 2008; Chen et al. 2022) raises serious concerns about a potential overfitting of the data by the DDM. This point is not highlighted in the Discussion. Robustness and sensitivity analyses are critical in this case.

      We thank the Reviewer for their thoughtful critique. We agree that a limited number of trials can forestall reliable estimation, which we acknowledge in the Discussion section. However, we used a hierarchical estimation approach which leverages group information to constrain individual-level estimates. This use of group-level parameters to inform individual-level estimates reduces overfitting and noise that can arise when trial counts are low, and the regularization inherent in hierarchical fitting prevents extreme parameter estimates that could arise from noisy or limited data (Rouder & Lu, 2005). As a result, hierarchical estimation has been repeatedly shown to work well in settings with low trial counts, including as few as 40 trials per condition (Ratcliff & Childers, 2015; Wiecki et al., 2013), and previous applications of the time-varying DDM to food choice task data has included experiments with as few as 60 trials per condition (Maier et al., 2020). We have added references to these more recent approaches and specifically note their advantages for the modeling of tasks with fewer trials. Additionally, our successful parameter recovery described in the Supplementary Materials supports the robustness of the estimation procedure and the reliability of our results.

      The authors compare different DDMs to show that the DDM they used to report statistical results in the main text is the best according to the WAIC criterion. This may be viewed as a robustness analysis. However, the other DDM models (i.e. M0, M1, M2 in the supplementary materials) they used to make the comparison have fewer parameters to estimate than the one they used in the main text. Fits are usually expected to follow the rule that the more there are parameters to estimate in a model, the better it fits the data. Additionally, a quick plot of the data in supplementary table S12 (i.e. WAIC as a function of the number of parameters varying by food type in the model - i.e. 0 for M0, 2 for M1, 1 for M2 and 3 for M3) suggests that models M1 and potentially M2 may be also suitable: there is a break in the improvement of WAIC between model M0 and the three other models. I would thus suggest checking how the results reported in the main text differ when using models M1 and M2 instead of M3 (for the taste and health weights when comparing M3 with M1, for τS when comparing M3 with M2). If the differences are important, the results currently reported in the main text are not very reliable.

      We thank the Reviewer for highlighting that it would be helpful for the paper to explicitly note that we specifically selected WAIC as one of two methods to assess model fit because it penalizes for model complexity. We now explicitly state that, in addition to being more robust than other metrics like AIC or BIC when comparing hierarchical Bayesian models like those in the current study, model fit metrics like WAIC penalize for model complexity based on the number of parameters (Watanabe, 2010). Therefore, it is not the case that more complex models (i.e., having additional parameters) would automatically have lower WAICs. Additionally, we note that our second method to assess model fit, posterior predictive checks demonstrate that only model M3 can reproduce key behavioral patterns present in the empirical data. As described in the Supplementary Materials, M1 and M2 miss those patterns in the data. In summary, we used best practices to assess model fit and reliability (Wilson & Collins, 2019): results from the WAIC comparison (which in fact penalizes models with more parameters) and results from posterior predictive checks align in showing that M3 best fit to our data. We have added a sentence to the manuscript to state this explicitly.

      (2) The second main concern deals with the association reported between the DDM parameters and binge eating episodes (i.e. last paragraph of the results section, page 7). The authors claim that the DDM parameters "predict" binge eating episodes (in the Abstract among other places) while the binge eating frequency does not seem to have been collected prospectively. Besides this methodological issue, the interpretation of this association is exaggerated: during the task, BN patients did not make binge-related food choices in the negative emotional state. Therefore, it is impossible to draw clear conclusions about binge eating, as other explanations seem equally plausible. For example, the results the authors report with the DDM may be a marker of a strategy of the patients to cope with food tastiness in order to make restrictive-like food choices. A comparison of the authors' results with restrictive AN patients would be of interest. Moreover, correlating results of a nearly instantaneous behavior (i.e. a couple of minutes to perform the task with the 42 food choices) with an observation made over several months (i.e. binge eating frequency collected over three months) is questionable: the negative emotional state of patients varies across the day without systematically leading patients to engage in a binge eating episode in such states.

      I would suggest in such an experiment to collect the binge craving elicited by each food and the overall binge craving of patients immediately before and after the task. Correlating the DDM results with these ratings would provide more compelling results. Without these data, I would suggest removing the last paragraph of the Results.

      We thank the Reviewer for these interesting suggestions and appreciate the opportunity to clarify that we agree that claims about causal connections between our decision parameters and symptom severity metrics would be inappropriate. Per the Reviewer’s suggestions, we have eliminated the use of the word “predict” to describe the tested association with symptom metrics.  We also agree that more time-locked associations with craving ratings and near-instantaneous behavior would be useful, and we have added this as an important direction for future research in the discussion. However, associating task-based behavior with validated self-report measures that assess symptom severity over long periods of time that precede the task visit (e.g., over the past 2 weeks in depression, over the past month in eating disorders) is common practice in computational psychiatry, psychiatric neuroimaging, and clinical cognitive neuroscience (Hauser et al., 2022; Huys et al., 2021; Wise et al., 2023), and this approach has been used several times specifically with food choice tasks (Dalton et al., 2020; Steinglass et al., 2015). We have revised the language throughout the manuscript to clarify: the results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic, but the results do not allow us to determine whether this reactivity causes the symptoms.

      In response to this Reviewer’s important point about negative affect not always producing loss-of-control eating in individuals with BN, we also now explicitly note that while several studies employing ecological momentary assessments (EMA) have repeatedly shown that increases in negative affect significantly increase the likelihood of subsequent loss-of-control eating (Alpers & Tuschen-Caffier, 2001; Berg et al., 2013; Haedt-Matt & Keel, 2011; Hilbert & Tuschen-Caffier, 2007; Smyth et al., 2007), not all loss-of-control eating occurs in the context of negative affect, and that future studies should integrate food choice task data pre and post-affect inductions with measures that capture the specific frequency of loss of control eating episodes that occur during states of high negative affect.

      (3) My major improvement point is to tone down as much as possible any claim of a link with binge eating across the entire manuscript and to focus more on the restrictive behavior of BN patients in between binge eating episodes (see my second major concern about the methods). Additionally, since this article is a secondary research paper and since some of the authors have already used the task with AN patients, if possible I would run the same analyses with AN patients to test whether there are differences between AN (provided they were of the restrictive subtype) and BN.

      We appreciate the Reviewer’s perspective and suggestions. We have adjusted our language linking loss-of-control eating frequency with decision parameters, and we have added additional sentences focusing on the implications for the restrictive behavior of patients with BN between binge eating episodes. In the Supplementary Materials. We have added an analysis of the restraint subscale of the EDE-Q and confirmed no relationship with parameters of interest. While we agree additional analyses with AN patients would be of interest, this is outside the scope of the paper. Our team have collected data from individuals with AN using this task, but not with any affect induction or measure of affect. Therefore, we have added this important direction for future research to the discussion.

      Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decision-making processes that may increase the likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant and methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      The sample size was relatively small and may have been underpowered to find differences in outcomes (i.e., food choice behaviors). Participants were all women with BN, which limits the generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. Moreover, it is unclear how long the negative affect persisted during the actual task. It is possible that any increases in negative affect would have dissipated by the time participants were engaged in the decision-making task.

      We thank the Reviewer for their comments on the strengths of the paper, and for highlighting these important considerations regarding the sample demographics and the negative affect induction. As in the original paper that focused only on ultimate food choice behaviors, we now specifically acknowledge that the study was only powered to detect small to medium group differences in the effect of negative emotion on these final choice behaviors. Regarding the sample demographics, we agree that the study’s inclusion of only female participants is a limitation.  Although the original decision for this sampling strategy was informed by data suggesting that bulimia nervosa is roughly six times more prevalent among females than males (Udo & Grilo, 2018), we now note in the discussion that our female-only sample limits the generalizability of the findings.

      We also agree with the Reviewer’s noted limitations of the negative mood induction, and based on the reviewer’s suggestions, we have added to our original description of these limitations in the Discussion. Specifically, we now note that although the task was completed immediately after the affect induction, the study did not include intermittent mood assessments throughout the choice task, so it is unclear how long the negative affect persisted during the actual task.

      Reviewer #3 (Public review):

      Summary:

      The study uses the food choice task, a well-established method in eating disorder research, particularly in anorexia nervosa. However, it introduces a novel analytical approach - the diffusion decision model - to deconstruct food choices and assess the influence of negative affect on how and when tastiness and healthiness are considered in decision-making among individuals with bulimia nervosa and healthy controls.

      Strengths:

      The introduction provides a comprehensive review of the literature, and the study design appears robust. It incorporates separate sessions for neutral and negative affect conditions and counterbalances tastiness and healthiness ratings. The statistical methods are rigorous, employing multiple testing corrections.

      A key finding - that negative affect induction biases individuals with bulimia nervosa toward prioritizing tastiness over healthiness - offers an intriguing perspective on how negative affect may drive binge eating behaviors.

      Weaknesses:

      A notable limitation is the absence of a sample size calculation, which, combined with the relatively small sample, may have contributed to null findings. Additionally, while the affect induction method is validated, it is less effective than alternatives such as image or film-based stimuli (Dana et al., 2020), potentially influencing the results.

      We agree that the small sample size and specific affect induction method may have contributed to the null model-agnostic behavioral findings. Based on this Reviewer’s and Reviewer 2’s comments, we have added these factors to our original acknowledgements of limitations in the Discussion.

      Another concern is the lack of clarity regarding which specific negative emotions were elicited. This is crucial, as research suggests that certain emotions, such as guilt, are more strongly linked to binge eating than others. Furthermore, recent studies indicate that negative affect can lead to both restriction and binge eating, depending on factors like negative urgency and craving (Leenaerts et al., 2023; Wonderlich et al., 2024). The study does not address this, though it could explain why, despite the observed bias toward tastiness, negative affect did not significantly impact food choices.

      We thank the Reviewer for raising these important points and possibilities. In the supplementary materials, we have added an additional analysis of the specific POMS subscales that comprise the total negative affect calculation that was reported in the original paper (Gianini et al., 2019), and which we now report in the main text. Ultimately, we found that, across both groups, the negative affect induction increased responses related to anger, confusion, depression, and tension while reducing vigor.

      We agree with the Reviewer that factors like negative urgency and cravings are relevant here. The study did not collect any measures of craving, and in response to Reviewer 1 and this Reviewer, we now note in the discussion that replication studies including momentary craving assessments will be important. While we don’t have any measurements of cravings, we did measure negative urgency. Despite these prior findings, the original paper (Gianini et al., 2019) did not find that negative urgency was related to restrictive food choices. We have now repeated those analyses, and we also were unable to find any meaningful patterns. Nonetheless, we have added an analysis of negative urgency scores and decision parameters to the supplementary materials.      

      References

      Alpers, G. W., & Tuschen-Caffier, B. (2001). Negative feelings and the desire to eat in bulimia nervosa. Eating Behaviors, 2(4), 339–352. https://doi.org/10.1016/S1471-0153(01)00040-X

      Berg, K. C., Crosby, R. D., Cao, L., Peterson, C. B., Engel, S. G., Mitchell, J. E., & Wonderlich, S. A. (2013). Facets of negative affect prior to and following binge-only, purge-only, and binge/purge events in women with bulimia nervosa. Journal of Abnormal Psychology, 122(1), 111–118. https://doi.org/10.1037/a0029703

      Dalton, B., Foerde, K., Bartholdy, S., McClelland, J., Kekic, M., Grycuk, L., Campbell, I. C., Schmidt, U., & Steinglass, J. E. (2020). The effect of repetitive transcranial magnetic stimulation on food choice-related self-control in patients with severe, enduring anorexia nervosa. International Journal of Eating Disorders, 53(8), 1326–1336. https://doi.org/10.1002/eat.23267

      Gianini, L., Foerde, K., Walsh, B. T., Riegel, M., Broft, A., & Steinglass, J. E. (2019). Negative affect, dietary restriction, and food choice in bulimia nervosa. Eating Behaviors, 33, 49–54. https://doi.org/10.1016/j.eatbeh.2019.03.003

      Haedt-Matt, A. A., & Keel, P. K. (2011). Revisiting the affect regulation model of binge eating: A meta-analysis of studies using ecological momentary assessment. Psychological Bulletin, 137(4), 660–681. https://doi.org/10.1037/a0023660

      Hauser, T. U., Skvortsova, V., Choudhury, M. D., & Koutsouleris, N. (2022). The promise of a model-based psychiatry: Building computational models of mental ill health. The Lancet Digital Health, 4(11), e816–e828. https://doi.org/10.1016/S2589-7500(22)00152-2

      Hilbert, A., & Tuschen-Caffier, B. (2007). Maintenance of binge eating through negative mood: A naturalistic comparison of binge eating disorder and bulimia nervosa. International Journal of Eating Disorders, 40(6), 521–530. https://doi.org/10.1002/eat.20401

      Huys, Q. J. M., Browning, M., Paulus, M. P., & Frank, M. J. (2021). Advances in the computational understanding of mental illness. Neuropsychopharmacology, 46(1), 3–19. https://doi.org/10.1038/s41386-020-0746-4

      Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour, 4(9), Article 9. https://doi.org/10.1038/s41562-020-0893-y

      Ratcliff, R., & Childers, R. (2015). Individual differences and fitting methods for the two-choice diffusion model of decision making. Decision, 2(4), 237–279. https://doi.org/10.1037/dec0000030

      Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. https://doi.org/10.3758/BF03196750

      Smyth, J. M., Wonderlich, S. A., Heron, K. E., Sliwinski, M. J., Crosby, R. D., Mitchell, J. E., & Engel, S. G. (2007). Daily and momentary mood and stress are associated with binge eating and vomiting in bulimia nervosa patients in the natural environment. Journal of Consulting and Clinical Psychology, 75(4), 629–638. https://doi.org/10.1037/0022-006X.75.4.629

      Steinglass, J., Foerde, K., Kostro, K., Shohamy, D., & Walsh, B. T. (2015). Restrictive food intake as a choice—A paradigm for study. International Journal of Eating Disorders, 48(1), 59–66. https://doi.org/10.1002/eat.22345

      Udo, T., & Grilo, C. M. (2018). Prevalence and Correlates of DSM-5–Defined Eating Disorders in a Nationally Representative Sample of U.S. Adults. Biological Psychiatry, 84(5), 345–354. https://doi.org/10.1016/j.biopsych.2018.03.014

      Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.

      Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python. Frontiers in Neuroinformatics, 7. https://doi.org/10.3389/fninf.2013.00014

      Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547. https://doi.org/10.7554/eLife.49547

      Wise, T., Robinson, O. J., & Gillan, C. M. (2023). Identifying Transdiagnostic Mechanisms in Mental Health Using Computational Factor Modeling. Biological Psychiatry, 93(8), 690–703. https://doi.org/10.1016/j.biopsych.2022.09.034

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study has preliminarily revealed the role of ACVR2A in trophoblast cell function, including its effects on migration, invasion, proliferation, and clonal formation, as well as its downstream signaling pathways.

      Strengths:

      The use of multiple experimental techniques, such as CRISPR/Cas9-mediated gene knockout, RNA-seq, and functional assays (e.g., Transwell, colony formation, and scratch assays), is commendable and demonstrates the authors' effort to elucidate the molecular mechanisms underlying ACVR2A's regulation of trophoblast function. The RNA-seq analysis and subsequent GSEA findings offer valuable insights into the pathways affected by ACVR2A knockout, particularly the Wnt and TCF7/c-JUN signaling pathways.

      Weaknesses:

      The molecular mechanisms underlying this study require further exploration through additional experiments. While the current findings provide valuable insights into the role of ACVR2A in trophoblast cell function and its involvement in the regulation of migration, invasion, and proliferation, further validation in both in vitro and in vivo models is needed. Additionally, more experiments are required to establish the functional relevance of the TCF7/c-JUN pathway and its clinical significance, particularly in relation to pre-eclampsia. Additional techniques, such as animal models and more advanced clinical sample analyses, would help strengthen the conclusions and provide a more comprehensive understanding of the molecular pathways involved.

      Reviewer #2 (Public review):

      Summary:

      ACVR2A is one of a handful of genes for which significant correlations between associated SNPs and the incidences of preeclampsia have been found in multiple populations. It is one of the TGFB family receptors, and multiple ligands of ACVR2A, as well as its coreceptors and related inhibitors, have been implicated in placental development, trophoblast invasion, and embryo implantation. This useful study builds on this knowledge by showing that ACVR2A knockout in trophoblast-related cell lines reduces trophoblast invasion, which could tie together many of these observations. Support for this finding is incomplete, as reduced proliferation may be influencing the invasion results. The implication of cross-talk between the WNT and ACRV2A/SMAD2 pathways is an important contribution to the understanding of the regulation of trophoblast function.

      Strengths:

      (1) ACVR2A is one of very few genes implicated in preeclampsia in multiple human populations, yet its role in pathogenesis is not very well studied and this study begins to address that hole in our knowledge.

      (2) ACVR2A is also indirectly implicated in trophoblast invasion and trophoblast development via its connections to many ligands, inhibitors, and coreceptors, suggesting its potential importance.

      (3) The authors have used multiple cell lines to verify their most important observations.

      Weaknesses:

      (1) There are a number of claims made in the introduction without attribution. For example, there are no citations for the claims that family history is a significant risk factor for PE, that inadequate trophoblast invasion of spiral arteries is a key factor, and that immune responses, and reninangiotensin activity are involved.

      Thank you for pointing out the lack of citations in some parts of the introduction. We have revised the manuscript to include appropriate references for the claims regarding family history as a risk factor for PE, the role of inadequate trophoblast invasion in spiral arteries, and the involvement of immune responses and the renin-angiotensin system. The revised text now includes citations to well-established studies in the field (Salonen Ros et al., 2000; Chappell LC et al., 2021; Brosens et al., 2002; Knofler et al., 2019; Redman CWG et al., 1999; LaMarca B et al., 2008). We believe these additions improve the scientific rigor of the manuscript.

      (2) The introduction states "As a receptor for activin A, ACVR2A..." It's important to acknowledge that ACVR2A is also the receptor for other TGFB family members, with varying affinities and coreceptors. Several TGFB family members are known to regulate trophoblast differentiation and invasion. For example, BMP2 likely stimulates trophoblast invasion at least in part via ACVR2A (PMID 29846546).

      Thank you for highlighting the broader role of ACVR2A as a receptor for multiple members of the TGF-β superfamily. We have revised the introduction to acknowledge that ACVR2A is not only the receptor for activin A but also interacts with other ligands, such as BMP2, which likely stimulates trophoblast invasion via ACVR2A (PMID: 29846546). This addition provides a more comprehensive view of ACVR2A's function in trophoblast biology. While the focus of our current study is on activin A, we agree that ACVR2A's role in mediating the effects of other TGF-β family members is an important topic for future research.

      (3) An alternative hypothesis for the potential role of ACVR2A in preeclampsia is its functions in the endometrium. In the mouse ACVR2A knockout in the uterus (and other progesterone receptorexpressing cells) leads to embryo implantation failure.

      Thank you for bringing up the potential role of ACVR2A in the endometrium as an alternative hypothesis. We have revised the discussion to acknowledge this possibility and cited relevant studies showing that uterine-specific knockout of ACVR2A in mice leads to embryo implantation failure (Monsivais et al., 2021). This suggests that ACVR2A may play a critical role in uterine receptivity and embryo implantation, which could influence placental development and preeclampsia pathogenesis. While our current study focuses on trophoblast-related functions of ACVR2A, we agree that investigating its role in the uterine environment is an important direction for future research.

      (4) In the description of the patient population for placental sample collections, preeclampsia is defined only by hypertension, and this is described as being in accordance with ACOG guidelines. ACOG requires a finding of hypertension in combination with either proteinuria or one of the following: thrombocytopenia, elevated creatinine, elevated liver enzymes, pulmonary, edema, and new onset unresponsive headache.

      We appreciate the reviewer’s detailed observation regarding the definition of preeclampsia.

      We have reviewed and clarified our description of the diagnostic criteria based on the American College of Obstetricians and Gynecologists (ACOG) guidelines. Specifically, we have revised the definition in the Materials and Methods section under "Collection of Placenta and Decidua Specimens," as follows: In accordance with the guidelines from the American College of Obstetricians and Gynecologists (ACOG, 2023), preeclampsia (PE) is diagnosed as hypertension (systolic blood pressure ≥140 mmHg or diastolic blood pressure ≥90 mmHg on at least two occasions) in combination with one or more of the following: proteinuria (≥300 mg/24-hour urine collection or protein/creatinine ratio ≥0.3), thrombocytopenia, elevated serum creatinine, elevated liver enzymes, pulmonary edema, or new-onset headache unresponsive to treatment.

      (5) I believe that Figures 1a and 1b are data from a previously published RNAseq dataset, though it is not entirely clear in the text. The methods section does not include a description of the analysis of these data undertaken here. It would be helpful to include at least a brief description of the study these data are taken from - how many samples, how were the PE/control groups defined, gestational age range, where is it from, etc. For the heatmap presented in B, what is the significance of the other genes/ why are they being shown? If the purpose of these two panels is to show differential expression specifically of ACVR2A in this dataset, that could be shown more directly.

      Clarification of RNAseq dataset: The Methods section has been revised to specify the dataset source (GEO accession number: GSE114691), which includes 20 PE and 21 control placental samples with gestational ages ranging from 34 to 38 weeks. PE and control groups were defined using clinical criteria such as hypertension and proteinuria, and these details have also been added to the Results section. RNAseq analysis description: We have included details of the differential gene expression analysis in the Methods section. Specifically, the DESeq2 R package was used, with thresholds of FDR < 0.05 and |log2(fold change) | ≥ 1. The selection of WNT pathwayrelated genes in Figure 1B is based on these analyses. Significance of the heatmap genes: The genes displayed in Figure 1B were selected based on their significant differential expression and enrichment in pathways relevant to PE pathogenesis, such as the WNT signaling pathway. We have clarified this in the Results section and updated the figure legend to explain their biological relevance. Purpose of Figures 1A and 1B: Figure 1A emphasizes the downregulation of ACVR2A in PE placentas, while Figure 1B complements this by presenting differentially expressed genes associated with the WNT pathway. These figures collectively highlight the role of ACVR2A in PE and its connection to broader molecular pathways. Text descriptions have been updated to improve clarity and focus.

      (6) More information is needed in the methods section to understand how the immunohistochemistry was quantified. "Quantitation was performed" is all that is provided. Was staining quantified across the whole image or only in anchoring villous areas? How were HRP & hematoxylin signals distinguished in ImageJ? How was the overall level of HRP/DAB development kept constant between the NC and PE groups?

      Thank you for pointing out the need for more details regarding the quantification of immunohistochemistry (IHC). We have now clarified and expanded the description of the IHC quantification process in the Methods section as follows: Quantification Across the Entire Section: IHC staining was assessed across the entire tissue section to account for global expression patterns. For quantitative analysis, representative regions from the anchoring villous areas, where ACVR2A expression is most prominent, were selected for comparison between NC and PE groups. This ensured that the analysis focused on biologically relevant regions. ImageJ Analysis:

      Images of stained sections were captured under identical magnifications and lighting conditions. Hematoxylin (blue, nuclear staining) and DAB/HRP (brown, protein-specific signal) were distinguished using ImageJ's color deconvolution plugin. The DAB/HRP signal was isolated and quantified based on the integrated optical density (IOD) within the selected regions. Consistency in HRP/DAB Development: To maintain consistency between NC and PE groups, all tissue samples were processed under identical experimental conditions, including the same antibody dilution, incubation times, and DAB/HRP development durations. Negative controls (without primary antibody) were included to monitor background staining, and the DAB reaction was stopped simultaneously across all samples to avoid overdevelopment. Statistical Analysis: The quantified DAB signal intensity was normalized to the area of the selected regions, and comparisons between NC and PE groups were performed using statistical tests (e.g., Student’s ttest). Results are reported as mean ± SD. We hope this additional detail addresses your concerns.

      (7) In Figure 1E it is not immediately obvious to many readers where the EVT are. It is probably worth circling or putting an arrow to the little region of ACVR2A+ EVT that is shown in the higher magnification image in Figure 1E. These are actually easier to see in the pictures provided in the supplement Figure 1. Of note, the STB is also staining positive. This is worth pointing out in the results text.

      Thank you for your suggestion regarding Figure 1E. To make the location of the ACVR2A+ extravillous trophoblasts (EVTs) more apparent, we have updated Figure 1E by adding arrows to indicate the regions of EVTs in the higher magnification image. Additionally, we have included annotations in the supplemental Figure S1 to further aid visualization. We appreciate your observation that syncytiotrophoblasts (STBs) also show positive staining for ACVR2A. We have revised the Results section to explicitly mention this finding and its potential significance.

      (8) It is not possible to judge whether the IF images in 1F actually depict anchoring villi. The DAPI is really faint, and it's high magnification, so there isn't a lot of context. Would it be possible to include a lower magnification image that shows where these cells are located within a placental section? It is also somewhat surprising that this receptor is expressed in the cytoplasm rather than at the cell surface. How do the authors explain this?

      Thank you for your suggestion to provide more context for the immunofluorescence (IF) images in Figure 1F. To address this, we have included lower magnification images in Supplementary Figure S2, showing the overall structure of the placental section and the location of the anchoring villi. These images help to contextualize the regions analyzed in Figure 1F, which were selected to clearly illustrate ACVR2A expression in extravillous trophoblasts (EVTs). In Figure 1F, we have focused on higher magnification images for better visualization of ACVR2A staining patterns in EVTs. Regarding the subcellular localization of ACVR2A, the receptor is predominantly expressed on the cell surface, as shown in our images. However, some intracellular staining is also observed, which may reflect receptor trafficking or recycling processes, consistent with the behavior of other activin receptors under physiological or pathological conditions. We have clarified these points in the Results and Discussion sections.

      (9) The results text makes it sound like the data in Figure 2A are from NCBI & Protein atlas, but the legend says it is qPCR from this lab. The methods do not detail how these various cell lines were grown; only HTR-SVNeo cell culture is described. Similarly, JAR cells are used for several experiments and their culture is not described.

      Thank you for pointing out the need for clarification regarding Figure 2A and cell culture methods. The data in Figure 2A were generated using RT-qPCR conducted in our laboratory, not solely based on data from NCBI or the Human Protein Atlas. We have revised the Results section to reflect this more accurately. Regarding the culture conditions, we acknowledge that the methods for other cell lines were not explicitly detailed. For this study, all cell lines, including JAR and other cancer cell lines, were cultured following standard protocols provided by the suppliers. Specifically, JAR cells and other cell lines were purchased from Wuhan Punosei Life Technology and were maintained in RPMI-1640 medium supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin under standard conditions (37°C, 5% CO<sub>2</sub>). This information has been added to the Methods section for clarity.

      (10) Under RT-qPCR methods, the phrase "cDNA reverse transcription cell RNA was isolated..." does not make any sense.

      Thank you for pointing out the unclear phrasing in the RT-qPCR methods section. We agree that the original description was not precise. To address this, we have revised the relevant section to improve clarity and accuracy. Specifically, the methods now explicitly describe the two key steps: RNA isolation and cDNA synthesis. The revised text reads: Total RNA was isolated from cells using a Total RNA Extraction Kit (TIANGEN, China) following the manufacturer’s instructions. The extracted RNA was reverse-transcribed into complementary DNA (cDNA) using a cDNA Synthesis Kit (Takara, Japan) according to the protocol provided by the manufacturer.

      (11) The paragraph beginning "Consequently, a potential association..." is quite confusing. It mentions analyzing ACVR2A expression in placentas, but then doesn't point to any results of this kind and repeats describing the results in Figure 2a, from various cell lines.

      Thank you for your comment regarding the paragraph beginning with "Consequently, a potential association...". We understand that the current wording may create confusion. The primary aim of this section is to compare ACVR2A expression levels across various cell lines, including trophoblast-derived and non-trophoblast cell lines, to highlight the relevance of ACVR2A in trophoblast function, particularly in invasion and migration. To address your concerns, we have revised the paragraph for clarity and logical flow. The updated text explicitly focuses on the comparison of ACVR2A expression across cell lines (Figure 2A) and how this supports the hypothesis that ACVR2A plays a key role in trophoblast invasion and migration. Additionally, the discussion of placental samples has been separated to avoid confusion with cell line results. We hope this revision resolves the issue.

      (12) The authors should acknowledge that the effect of the ACVR2A knockout on proliferation makes it difficult to draw any conclusions from the trophoblast invasion assays. That is, there might be fewer migrating or invading cells in the knockout lines because there are fewer cells, not because the cells that are there are less invasive. Since this is a central conclusion of the study, it is a major drawback.

      Thank you for highlighting this important point. We agree that the reduced proliferation observed in ACVR2A knockout cells could influence the results of the invasion assays, as fewer cells may inherently lead to reduced invasion. To minimize this effect, we conducted the invasion and migration assays under low-serum conditions (1–2% serum) to limit cell proliferation during the experimental timeframe. This approach was based on optimization trials and existing literature, as serum-free conditions were found to negatively impact cell viability and experimental reproducibility. While these efforts helped to mitigate the impact of proliferation on the results, we acknowledge this as a limitation of our study and have added this discussion to the manuscript. Future studies could incorporate approaches such as normalizing cell numbers or using additional proliferation-independent methods to confirm the findings. We hope this clarification and the steps taken address your concerns.

      (13) The legend and the methods section do not agree on how many fields were selected for counting in the transwell invasion assays in Figure 3C. The methods section and the graph do not match the number of replicate experiments in Figure 3D (the number of replicate experiments isn't described for 3C).

      Thank you for pointing out the inconsistencies regarding the number of fields counted and the number of replicates in the Transwell invasion assays (Figure 3C) and colony formation assays (Figure 3D). We apologize for the lack of clarity in the Methods section and figure legend. To address this, we have revised both the figure legends and the Methods section for consistency and added detailed descriptions. For Figure 3C, cell invasion was quantified by randomly selecting 5 fields of view per sample under 300× magnification. Images shown in the figure were taken at lower magnification to provide a better visual comparison between experimental and control groups. For Figure 3D, each experiment was independently repeated at least 10 times to ensure robust and reproducible results. These clarifications have been incorporated into the revised manuscript. We appreciate your feedback and believe this revision improves the clarity and transparency of our methods.

      (14) Discussion says "Transcriptome sequencing analysis revealed low ACVR2A expression in placental samples from PE patients, consistent with GWAS results across diverse populations." The authors should explain this briefly. Why would SNPs in ACVR2A necessarily affect levels of the transcript?

      Thank you for raising this important point. We acknowledge that our study did not directly investigate how SNPs in the ACVR2A gene affect transcript levels. However, prior studies have suggested that SNPs can influence gene expression through various mechanisms. For example, SNPs in regulatory regions (e.g., promoters, enhancers, or untranslated regions) may affect transcription factor binding, RNA stability, or splicing efficiency, ultimately altering transcript levels. While we did not directly assess the functional consequences of ACVR2A SNPs in this study, the observed downregulation of ACVR2A in PE placentas aligns with the potential regulatory impact of SNPs previously identified in GWAS studies. To address this, we have revised the Discussion section to clarify the relationship between SNPs and transcript levels and acknowledge this limitation.  

      (15) "The expression levels of ACVR2A mRNA were comparable to those of tumor cells such as A549. This discovery suggested a potential pivotal role of ACVR2A in the biological functions of trophoblast cells, especially in the nurturing layer." Alternatively, ACVR2A expression resembles that of tumors because the cell lines used here are tumor cells (JAR) or immortalized cells (HTR8). These lines are widely used to study trophoblast properties, but the discussion should at least acknowledge the possibility that the behavior of these cells does not always resemble normal trophoblasts.

      Thank you for pointing out this important limitation. We agree that the JAR and HTR8/SVneo cell lines, being tumor-derived or immortalized, may not fully replicate the behavior of normal trophoblast cells. While these cell lines are widely used as models for studying trophoblast properties due to their ease of culture and invasive behavior, their gene expression and signaling pathways could partially reflect their tumorigenic or immortalized origins. We have revised the Discussion section to acknowledge this limitation and clarify the interpretation of ACVR2A expression levels in these cells.

      (16) The authors should discuss some of what is known about the relationship between the TCF7/c-JUN pathway and the major signaling pathway activated by ACVR2A, Smad 2/3/4. The Wnt and TGFB family cross-talk is quite complex and it has been studied in other systems.

      Thank you for highlighting the relationship between the TCF7/c-JUN pathway and Smad2/3/4 signaling. In our study, we chose to focus on Smad1/5 due to its strong association with ACVR2A in placental development, as demonstrated in a recent study(DOI: 10.1038/s41467-021-23571-5). This study showed that the BMP signaling pathway, mediated through ACVR2A-Smad1/5, is essential for endometrial receptivity and embryo implantation. While Smad2/3/4 are wellestablished mediators of TGF-β signaling, Smad1/5 activation is more directly linked to ACVR2A in the context of reproductive biology.

      In PE placentas, we observed a significant downregulation of Smad1/5 expression, which supports the hypothesis that ACVR2A-mediated Smad signaling is disrupted in this condition. Although we did not directly assess Smad2/3/4 in this study, prior research has shown that Smad2/3 can interact with TCF/LEF transcription factors to regulate Wnt-related target genes, suggesting potential cross-talk between these pathways. We have now clarified this rationale and included a discussion of these interactions in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Several points need to be addressed to improve the clarity and robustness of the presented findings:

      (1) From a clinical perspective, several concerns arise regarding the interpretation of these findings. First, the small sample size of 20 patients may not be representative of the broader population, limiting the generalizability of the results. Additionally, although no significant differences in age and pre-pregnancy BMI were observed between the PE and normal control groups, other clinical variables, such as hypertension or gestational diabetes, may also influence ACVR2A expression and contribute to PE development. Furthermore, while the study suggests a correlation between reduced ACVR2A expression and PE, it remains unclear whether this association holds true across different subtypes of PE or whether there are other underlying clinical factors that could account for these changes in gene expression. These factors need to be considered in future studies to better understand the clinical relevance of ACVR2A in PE.

      Thank you for raising these insightful concerns about the clinical interpretation of our findings. We agree that the small sample size of 20 patients may limit the generalizability of our results. To address this, we are actively expanding our cohort by collecting additional clinical samples from PE patients and normotensive controls. This effort aims to strengthen the robustness of our findings and provide stronger evidence for the role of ACVR2A in PE. We would also like to clarify that, during the initial sample collection, we specifically included only PE patients without comorbidities such as gestational diabetes, chronic hypertension, or other pregnancy-related complications. This strict selection criterion was implemented to minimize the potential influence of confounding clinical variables and ensure that our findings specifically reflect the association between ACVR2A expression and PE. While our study provides important initial insights, we recognize the need for larger-scale studies to validate these findings. The ongoing collection of clinical samples will allow us to address this limitation and enhance the translational relevance of our research. We have revised the manuscript to reflect these points and highlight our plans to strengthen the study by increasing the sample size.

      (2) The section "Precision Genome Surgery: ACVR2A Knockout via CRISPR/Cas9" in the results contains some issues with expression details. The results section should be more structured, with data presented in a more detailed and clear manner, ensuring that there is a clear connection between each experimental step and its corresponding result. For example, the sentence "Following multiple rounds of monoclonal culture, genotype identification, RT-qPCR and Western blotting (WB) analysis for screening, specific double-knockout monoclonal cell lines were distinctly chosen" contains redundant phrasing and unnecessary details, which affect the flow of the text.

      Thank you for your constructive feedback on the “Precision Genome Surgery: ACVR2A Knockout via CRISPR/Cas9” section. We agree that this section can be better structured to present the data in a more detailed and coherent manner. To address this, we have reorganized the results into distinct steps, ensuring a clear connection between each experimental step and its corresponding result. Redundant phrasing has been removed to improve the flow and readability of the text. The revised section emphasizes the purpose of each step, the screening process, and the specific results obtained.

      (3) The figure legends and panel labels in Figure 3 should be revised to ensure clarity and consistency. The figure legend should specify the exact panels (e.g., Figure 3A, 3B, 3C, etc.) and clearly describe the experimental conditions and results shown in each part.

      Thank you for pointing out the need for improved clarity and consistency in the figure legends and panel labels for Figure 3. We have revised the figure legend to specify each panel (e.g., Figure 3A, 3B, 3C, etc.) and included detailed descriptions of the experimental conditions and results displayed in each part. These updates aim to ensure better understanding and alignment between the figure legend and the panels.

      (4) Lack of In Vivo Validation of ACVR2A Knockout: The study does not include in vivo experiments to validate the effects of ACVR2A knockout. It would be important to investigate whether similar regulatory effects of ACVR2A on trophoblast cell migration and invasion can be observed in animal models or in larger clinical studies. The lack of in vivo data raises questions about the translational relevance of the findings.

      Thank you for highlighting the importance of in vivo validation to assess the translational relevance of our findings. While we acknowledge that in vivo experiments could provide additional insights into the role of ACVR2A in trophoblast migration and invasion, this study was primarily designed as an in vitro investigation to explore the molecular mechanisms underlying ACVR2A function in trophoblast cells. The choice of an in vitro model allowed us to perform precise and controlled mechanistic analyses, which are critical for establishing a foundation for future research. We agree that in vivo studies using animal models or larger clinical cohorts are important next steps to validate the regulatory effects of ACVR2A on trophoblast function and its contribution to PE pathogenesis. These directions will be pursued in future research to further establish the translational potential of our findings. We have included this perspective in the revised Discussion section.

      (5) TCF7/c-JUN Pathway in Clinical Samples: In the study of the TCF7/c-JUN pathway, the authors mention assessing protein expression in clinical samples through immunohistochemistry (IHC). However, the manuscript does not provide a clear explanation of how the findings from laboratory cell models (such as HTR8/SVneo and JAR) relate to the clinical samples. Specifically, while ACVR2A knockout is shown to affect these proteins at the cellular level, it is unclear whether this effect is observed in clinical samples. Therefore, further validation of the TCF7/c-JUN pathway in the cell models and exploration of its relationship with protein expression in clinical samples is necessary. Additional experiments, such as immunofluorescence staining or mass spectrometry, could further confirm the role of the TCF7/c-JUN pathway in cells and provide a more direct comparison with clinical data.

      Thank you for highlighting the need to connect findings from cell models to clinical samples, particularly with respect to the TCF7/c-JUN pathway. In response to your comment, we conducted additional experiments using Western blot analysis to evaluate the expression of ACVR2A, SMAD1/5, SMAD4, pSMAD1/5/9, and TCF7L1/TCF7L2 in PE placental tissues compared to normotensive controls (Figure 7A). The results demonstrated significantly reduced expression of these proteins in PE placentas, providing evidence that disruptions in the ACVR2A-SMAD and TCF7/c-JUN signaling pathways observed in vitro are also present in clinical samples.

      These findings strengthen the translational relevance of our study by directly linking the molecular mechanisms identified in cell models to clinical observations. We have updated the Results and Discussion sections to incorporate these new data, and we believe this addition addresses your concern about the relationship between in vitro and clinical findings.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      The authors have constructively responded to previous referee comments and I believe that the manuscript is a useful addition to the literature. I particularly appreciate the quantitative approach to social behavior, but have two cautionary comments.

      (1) Conceptually it is important to further justify why this particular maximum entropy model is appropriate. Maximum entropy models have been applied across a dizzying array of biological systems, including genes, neurons, the immune system, as well as animal behavior, so would seem quite beneficial to explain the particular benefits here, for mouse social behavior as coarse-grained through the eco-hab chamber occupancy. This would be an excellent chance to amplify what the models can offer for biological understanding, particularly in the realm of social behavior

      We thank the reviewer for this comment. Maximum entropy models, along with other statistical inference methods that learn interaction patterns from simultaneously-measured degrees of freedom, help distinguish various types of interactions, e.g. direct vs. indirect interactions among animals, individual preference to food vs. social interaction with pairs. As research on social behavior expands from focusing on pairs of animals to studying groups in (semi-)naturalistic environments, maximum entropy models serve as a crucial link between high-throughput data and the need to identify and distinguish interaction rules. Specifically, among all possible maximum entropy models, the pairwise maximum entropy model is one of the simplest that can describe interactions among individuals, which serves as an excellent starting point to understand collective and social behavior in animals.

      Although the Eco-HAB setup currently records spatially coarse-grained data, it still provides more spatial information compared to the traditional three-chamber tests used to assess sociability for rodents. By showing that the maximum entropy model can effectively analyze Eco-HAB data, we hope to highlight its potential in research of social behavior in animals.

      To amplify what the models can offer for biological understanding particularly in the realm of social behavior, We have updated the Introduction to add a more logical structure to the need of using maximum entropy models to identify interactions among mice. Additionally, we updated the first paragraph of the Discussion to make it specific that it is the use of maximum entropy models that identifies interaction patterns from the high-throughput data. Finally, we have also added in the Discussion (line 422-425) arguments supporting the specific use of pairwise maximum entropy models to study social behaviors.

      (2) Maximum entropy models of even intermediate size systems involve a large number of parameters. The authors are transparent about that limitation here, but I still worry that the conclusion of the sufficiency of pairwise interactions is simply not general, and this may also relate to the differences from previous work. If, as the authors suggest in the discussion, this difference is one of a choice of variables, then that point could be emphasized. The suggestion of a follow up study with a smaller number of mice is excellent.

      We thank the reviewer for raising the issue and agree that the caveat of how general pairwise interactions can describe social behavior of animals needs to be discussed. We have added a sentence in the Discussion to point out this important caveat. “More generally, this discrepancy when looking at different choices of variables raises the issue that when studying social behavior of animals in a group, it is important to test and compare interaction models with different complexity (e.g. pairwise or with higher-order interactions).” We have also toned down our conclusion to limit our results of pairwise interactions describing mice co-localization patterns to the data collected in Eco-HAB (also see Reviewer 3 Major Point 2).

      Reviewer #3 (Public review):

      Summary:

      Chen et al. present a thorough statistical analysis of social interactions, more precisely, co-occupying the same chamber in the Eco-HAB measurement system. They also test the effect of manipulating the prelimbic cortex by using TIMP-1 that inhibits the MMP-9 matrix metalloproteinase. They conclude that altering neural plasticity in the prelimbic cortex does not eliminate social interactions, but it strongly impacts social information transmission.

      Strengths:

      The quantitative approach to analyzing social interactions is laudable and the study is interesting. It demonstrates that the Eco-HAB can be used for high throughput, standardized and automated tests of the effects of brain manipulations on social structure in large groups of mice.

      Weaknesses:

      A demonstration of TIMP-1 impairing neural plasticity specifically in the prelimbic cortex of the treated animals would greatly strengthen the biological conclusions. The Eco-HAB provides coarser spatial information compared to some other approaches, which may influence the conclusions.

      Recommendations for the authors:  

      Reviewer #3 (Recommendations for the authors):

      Major points

      (1) Do the Authors have evidence that TIMP-1 was effective, as well as specific to the prelimbic cortex?

      We refer to the literature for the effectiveness and specificity of TIMP-1 to the prelimbic cortex.

      Specifically, the study by Okulski et al. (Biol. Psychiatry 2007) provides clear evidence that TIMP1 plays a role in synaptic plasticity in the prefrontal cortex. They showed that TIMP-1 is induced in the medial prefrontal cortex (mPFC) following stimulation that triggers late long-term potentiation (LTP), a key model of synaptic plasticity. Overexpression of TIMP-1 in the mPFC blocked the activity of matrix metalloproteinases (MMPs) and prevented the induction of late LTP in vivo. Similar effects were observed with pharmacological inhibition of MMP-9 in vitro, reinforcing the idea that TIMP-1 regulates extracellular proteolysis as part of the plasticity mechanism in the prefrontal cortex. These findings confirm that TIMP-1 is both effective and active in this specific brain region.

      Further evidence comes from Puścian et al. (Mol. Psychiatry 2022), who used TIMP-1-loaded nanoparticles to influence neuronal plasticity in the amygdala. They found that TIMP-1 affected MMP expression, LTP, and dendritic morphology, showing its impact on synaptic modifications. More directly relevant, Winiarski et al. (Sci. Adv. 2025) demonstrated that injecting TIMP-1-loaded nanoparticles into the prelimbic cortex altered responses to social stimuli, further supporting the idea that TIMP-1 has region-specific effects on behavioral processes.

      We have also updated the main text (page 8, 1st paragraph of “Effect of impairing neuronal plasticity in the PL on subterritory preferences and sociability”) of the manuscript to include the above references.

      (2) The Authors seem to suggest that one main reason for the different results compared to Shemesh et al. 2013 was the coarseness of the Eco-HAB data. In this case, I think this conclusion should be toned down because of this significant caveat.

      We thank the reviewer for pointing this out, and agree that this caveat and difference should be emphasized. To tone down the conclusion, we have

      (1) added details about the Eco-HAB (it being coarse-grained, etc.) in the abstract to tone down the conclusion.

      (2) added to the results summary in the Discussion (top of page 12) that the results are “within in the setup of the semi-naturalistic Eco-HAB experiments”

      (3) added to the Discussion (page 13) that the different results compared to Shemesh et al 2013 means that general studies of social behavior need to compare models with different levels of complexity (e.g. pairwise vs. higher-order interactions). (Also see Reviewer 2 Comment 2.)

      Minor points

      (1) Please explain what is measured in Fig. 1C (what is on the y axis?).

      Figure 1C shows the activity of the mice as measured by the rate of transitions, i.e. the number of times the mice switch boxes during each hour of the day, averaged over all N = 15 mice and T = 10 days (cohort M1). The error bars represent variability of activities across individuals or across days. For mouse-to-mouse variability (blue), we first compute for each mouse its number of transitions averaged over the same hour for all 10 days, then we compute its standard deviation across all 15 mice and plot it as error bars. For day-to-day variability (orange), we first compute for each day the number of transitions for each hour averaged over all mice, then compute its standard deviation across all 10 days as the errorbar. We have added the detailed explanation in the caption of Figure 1C.

      (2) In Fig. 3, it would be better to present the control group also in the main figure instead of the supplementary.

      We have merged Figure 3 and Figure 3 Supplementary 1 to present the control group also in the main figure.

      (3) In Fig. 3 and corresponding supplements, there seems to be a large difference between males and females. I think this would deserve some more discussion.

      While not being the main focus of this paper, we agree with the reviewer that the difference between male and female is important and deserves attention in the discussion and also future study. Thus we have added a paragraph in the Discussion (line 394-399, bottom of page 12).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this report, the authors made use of a murine cell life derived from a MYC-driven liver cancer to investigate the gene expression changes that accompany the switch from normoxic to hypoxia conditions during 2D growth and the switch from 2D monolayer to 3D organoid growth under normoxic conditions. They find a significant (ca. 40-50%) overlap among the genes that are dysregulated in response to hypoxia in 2D cultures and in response to spheroid formation. Unsurprisingly, hypoxia-related genes were among the most prominently deregulated under both sets of conditions. Many other pathways pertaining to metabolism, splicing, mitochondrial electron transport chain structure and function, DNA damage recognition/repair, and lipid biosynthesis were also identified.

      We thank this reviewer for his/her time and efforts, and the insightful comments.

      Major comments:

      (1) Lines 239-240: The authors state that genes involved in DNA repair were identified as being necessary to maintain survival of both 2D and 3D cultures (Figure S6A). Hypoxia is a strong inducer of ROS. Thus, the ROS-specific DNA damage/recognition/repair pathways might be particularly important. The authors should look more carefully at the various subgroups of the many genes that are involved in DNA repair. They should also obtain at least a qualitative assessment of ROS and ROS-mediated DNA damage by staining for total and mitochondrial-specific ROS using dyes such as CM-H2-DCFDA and MitoSox. Actual direct oxidative damage could be assessed by immunostaining for 8-oxo-dG and related to the sub-types of DNA damage-repair genes that are induced. The centrality of DNA damage genes also raises the question as to whether the previously noted prominence of the TP53 pathway (see point 5 below) might represent a response to ROS-induced DNA damage.

      We thank this reviewer for the insightful comments, and agreed that ROS induced by hypoxia could play a role in modulating DNA repair and consequently cellular essentiality. Although pathway enrichment in Figure S6A (now as Figure 2-figure supplement 4A) showed that DNA repair pathway was essential to cell survival in hypoxia and 3D cultures, the genes associated with this pathway (Ddb1;Brf2;Gtf3c5;Guk1;Taf6) are not typical DNA repair genes. They are more likely involved in gene transcription. However, it will be interesting to see if they are specifically involved in DNA damage in response to ROS, which is out of focus of this study.

      (2) Because most of the pathway differences that distinguish the various cell states from one another are described only in terms of their transcriptome variations, it is not always possible to understand what the functional consequences of these changes actually are. For example, the authors report that hypoxia alters the expression of genes involved in PDH regulation but this is quite vague and not backed up with any functional or empirical analyses. PDH activity is complex and regulated primarily via phosphorylation/dephosphorylation (usually mediated by PDK1 and PDP2, respectively), which in turn are regulated by prevailing levels of ATP and ADP. Functionally, one might expect that hypoxia would lead to the down-regulation of PDH activity (i.e. increased PDH-pSer392) as respiration changes from oxidative to non-oxidative. This would not be appreciated simply by looking at PDH transcript levels. This notion could be tested by looking at total and phospho-PDH by western blotting and/or by measuring actual PDH activity as it converts pyruvate to AcCoA.

      We agreed with this reviewer that PDH activity regulation could be affected by multi-factors, and it is worthy of further validation by other approaches.

      (3) Line 439: Related to the above point: the authors state: "It is likely that blockade of acetyl-CoA production by PDH knockout may force cells to use alternative energy sources under hypoxic and 3D conditions, averting the Warburg effect and promoting cell survival under limited oxygen and nutrient availability in 3D spheroids." This could easily be tested by determining whether exogenous fatty acids are more readily oxidized by hypoxic 2D cultures or spheroids than occurs in normoxic 2D cultures.

      We thank for this suggestion. We apologized for not being able to validate everything.

      (4) Line 472: "Hypoxia induces high expression of Acaca and Fasn in NEJF10 cells indicating that hypoxia promotes saturated fatty acid synthesis...The beneficial effect of Fasn and Acaca KO to NEJF10 under hypoxia is probably due to reduction of saturated fatty acid synthesis, and this hypothesis needs to be tested in the future.". As with the preceding comment, this supposition could readily be supported directly by, for example, performing westerns blots for these enzymes and by showing that incubation of hypoxic 2D cells or spheroids converted more AcCoA into lipid.

      We thank for this suggestion. However, functional validation for the Fasn and Acaca KO is out of focus in this study.

      (5) In Supplementary Figure 2B&C, the central hub of the 2D normoxic cultures is Myc (as it should well be) whereas, in the normoxic 3D, the central hub is TP53 and Myc is not even present. The authors should comment on this. One would assume that Myc levels should still be quite high given that Myc is driven by an exogenous promoter. Does the centrality of TP53 indicate that the cells within the spheroids are growtharrested, being subjected to DNA damage and/or undergoing apoptosis?

      The predicted transcription factor activity analysis was based on the differential ATAC-seq peaks among different culture through pairwise comparisons. If TP53 and MYC were not present under that condition, it did not mean their activity was absent.

      “…the centrality of TP53 indicate that the cells within the spheroids are growth-arrested, being subjected to DNA damage and/or undergoing apoptosis?” This reviewer has raised an interesting question. We are investigating this hypothesis and hopefully we can give a clear answer in the future.

      (6) In the Materials and Methods section (lines 711-720), the description of how spheroid formation was achieved is unclear. Why were the cells first plated into non-adherent 96 well plates and then into nonadherent T75 flasks? Did the authors actually utilize and expand the cells from 144 T75 flasks and did the cells continue to proliferate after forming spheroids? Many cancer cell types will initially form monolayers when plated onto non-adherent surfaces such as plastic Petri dishes and will form spheroid-like structures only after several days. Other cells will only aggregate on the "non-adherent" surface and form spheroid-like structures but will not actually detach from the plate's surface. Have the authors actually documented the formation of true, non-adherent spheroids at 2 days and did they retain uniform size and shape throughout the collection period? The single photo in Supplementary Figure 1 does not explain when this was taken. The authors include a schematic in Figure 2A of the various conditions that were studied. A similar cartoon should be included to better explain precisely how the spheroids were generated and clarify the rationale for 96 well plating. Overall, a clearer and more concise description of how spheroids were actually generated and their appearance at different stages of formation needs to be provided.

      The cells were initially plated in non-adherent 96-well plates to facilitate the formation of spheroids in a controlled and uniform manner. As correctly mentioned by the reviewer, during the initial stages, cells cultured on non-adherent surfaces often form aggregates or clumps, and it takes a few days for them to develop into solid spheroids.

      In our study, we aimed to achieve 3D spheroid formation immediately following the transduction process to allow for screening under both 2D and 3D conditions. Plating the cells into 96-well plates enabled us to monitor and control the formation of spheroids in smaller volumes before scaling up the culture in non-adherent T75 flasks for subsequent experimental steps. This setup allows us to maintain gene editing processes under both 2D and 3D conditions.

      Regarding the proliferation and uniformity of spheroids:

      • Yes, the spheroids continued to proliferate after their formation.

      • True, non-adherent spheroids were documented as early as the next day. This was visually confirmed under microscopy, and size uniformity was maintained throughout the collection period by following optimized culture protocols.

      We also agreed with the reviewer’s suggestion to include a cartoon schematic similar to Figure 2A, illustrating the spheroid generation process and clarifying the rationale for using 96-well plates. We have included such a cartoon and speroid growth curve monitored by Incucyte as Figure 2-figure supplement 2.

      (7) The authors maintained 2D cultures in either normoxic or hypoxic (1% O2) states during the course of their experiments. On the other hand, 3D cultures were maintained under normoxic conditions, with the assumption that the interiors of the spheroids resemble the hypoxic interiors of tumors. However, the actual documentation of intra-spheroid hypoxia is never presented. It would be a good idea for the authors to compare the degree of hypoxia achieved by 2D (1% O2) and 3D cultures by staining with a hypoxia-detecting dye such as Image-iT Green. Comparing the fluorescence intensities in 2D cultures at various O2 concentrations might even allow for the construction of a "standard curve" that could serve to approximate the actual internal O2 concentration of spheroids. This would allow the authors to correlate the relative levels of hypoxia between 2D and 3D cultures.

      This is an excellent idea that we certainly will do it in our future experiments.

      (8) Related to the previous 2 points, the authors performed RNAseq on spheroids only 48 hours after initiating 3D growth. I am concerned that this might not have been a sufficiently long enough time for the cells to respond fully to their hypoxic state, especially given my concerns in Point 6. Might the results have been even more robust had the authors waited longer to perform RNA seq? Why was this short time used?

      We agreed with this reviewer. We were unsure if 48hours was an ideal timepoint. It might be necessary to perform a longitudinal experiment to harvest samples under different timepoints in the future experiments.

      (9) What happens to the gene expression pattern if spheroids are re-plated into standard tissue culture plates after having been maintained as spheroids? Do they resume 2D growth and does the gene expression pattern change back?

      This is a great question and we have never thought about what the gene expression pattern would be if speroids are re-plated in 2D. This could be a challenging experiment because the gene expression and epigenetic changes are timing related. However, the cells do grow well after re-plated in 2D.

      (10) Overall, the paper is quite descriptive in that it lists many gene sets that are altered in response to hypoxia and the formation of spheroids without really delving into the actual functional implications and/or prioritizing the sets. Some of these genes are shown by CRISPR screening to be essential for maintaining viability although in very few cases are these findings ever translated into functional studies (for example, see points 14 above). The list of genes and gene pathways could benefit from a better explanation and prioritization of which gene sets the authors believe to be most important for survival in response to hypoxia and for spheroid formation.

      This was a genome-wide study that integrated RNA-seq, ATAC-seq and CRISPR KO, providing resource to understand the oncogenic pathways in different culture conditions. We believe we have clearly articulated the important genes/pathways in our abstract.

      (11) The authors used a single MYC-driven tumor cell line for their studies. However, in their original paper (Fang, et al. Nat Commun 2023, 14: 4003.) numerous independent cell lines were described. It would help to know whether RNAseq studies performed on several other similar cell lines gave similar results in terms of up & down-regulated transcripts (i.e. representative of the other cell lines are NEJF10 cells).

      We have not generated RNA-seq data for these cell lines cultured in different conditions.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Fang et al., provides a tour-de-force study uncovering cancer cell's varied dependencies on several gene programs for their survival under different biological contexts. The authors addressed genomic differences in 2D vs 3D cultures and how hypoxia affects gene expression. They used a Myc-driven murine liver cancer model grown in 2D monolayer culture in normoxia and hypoxia as well as cells grown as 3D spheroids and performed CRISPR-based genome-wide KO screen to identify genes that play important roles in cell fitness. Some context-specific gene effects were further validated by in-vitro and in-vivo gene KO experiments.

      Strengths:

      The key findings in this manuscript are:

      (1) Close to 50% of differentially expressed genes were common between 2D Hypoxia and 3D spheroids conditions but they had differences in chromatin accessibility.

      (2) VHL-HIF1a pathway had differential cell fitness outcomes under 2D normoxia vs 2D hypoxia and 3D spheroids.

      (3) Individual components of the mitochondrial respiratory chain complex had contrasting effects on cell fitness under hypoxia.

      (4) Knockout of organogenesis or developmental pathway genes led to better cell growth specifically in the context of 3D spheroids and knockout of epigenetic modifiers had varied effects between 2D and 3D conditions.

      (5) Another key program that leads to cells fitness outcomes in normoxia vs hypoxia is the lipid and fatty acid metabolism.

      (6) Prmt5 is a key essential gene under all growth conditions, but in the context of 3D spheroids even partial loss of Prmt5 has a synthetic lethal effect with Mtap deletion and Mtap is epigenetically silenced specifically in the 3D spheroids.

      We appreciate this reviewer for acknowledging the strengths of our study.

      Issues to address:

      (1) The authors should clarify the link between the findings of the enrichment of TGFb-SMAD signaling REACTOME pathway to the findings that knocking out TGFb-SMAD pathway leads to better cell fitness outcomes for cells in the 3D growth conditions.

      We have clarified this link in abstract by saying “Notably, multicellular organogenesis signaling pathways including TGFb-SMAD, which is upregulated in 3D culture, specifically constrict the uncontrolled cell proliferation in 3D while inactivation of epigenetic modifiers (Bcor, Kmt2d, Mettl3 and Mettl14) has opposite outcomes in 2D vs. 3D:

      (2) Supplementary Figure 4C has been cited in the text but doesn't exist in the supplementary figures section.

      Sorry for this typo. It should be 5C which is Figure 2-figure supplement 3C in the new version of MS. We have corrected it now.

      (3) A small figure explaining this ABC-Myc driven liver cancer model in Supplementary Figure 1 would be helpful to provide context.

      We appreciate this suggestion. We have added a cartoon as Figure 1-figure supplement 1A to indicate the procedure for generation of this model.

      (4) The method for spheroids formation is not found in the method section.

      We described the method in our previous publication (Nature Communications 2023 Jul 6;14(1):4003.). However, we have added the information in method now, and the procedure is very simple (line 623-624). We found the murine liver cancer cell lines can readily form spheroids when they are cultured in low-attachment dish with standard DMEM complete media.

      (5) In Supplementary Figure 1b, the comparisons should be stated the opposite way - 3D vs 2D normoxia and 2D-Hypoxia vs 2D-Normoxia.

      We have made correction in the Figure legend of Figure S1B which is Figure 1B now in the new version of MS.

      (6) There are typos in the legend for Supplementary Figure 10.

      We have checked the typos.

      (7) Consider putting Supplementary Figure 1b into the main Figure 1.

      We have moved both Supplementary Figure 1a and 1b into main Figure 1 as Figure 1A and 1B. Hopefully, this will help the readers to catch the information easily.

      (8) Please explain only one timepoint (endpoint) for 3D spheroids was performed for the CRISPR KO screen experiment, while several timepoints were done for 2D conditions? Was this for technical convenience?

      As this reviewer speculated, indeed this was for technical convenience. We found that it was technically challenging to split the spheroids for CRISPR screening.

      (9) In line 372, it is indicated that Bcor KO (Fig 5e) had growth advantage - this was observed in only one of the gRNA -- same with Kmt2d KO in the same figure where there was an opposite effect. Please justify the use of only one gRNA.

      We actually used 4 gRNAs for each gene. In the heatmap, although one of the gRNA for each gene showed some levels of enrichment under hypoxic 2D condition, they were all highly enriched in 3D.

      (10) Why was CRISPR based KO strategy not used for the PRMT5 gene but rather than the use of shRNA.? Note that one of the shRNA for PRMT5 had almost no KO (PRMT5-shRNA2 Figure 7B) but still showed phenotype (Figure 7D) - please explain.

      We used shRNA as second approach for cross-validation. We agreed that the knockdown efficiency of shRNA2 was not as good as the others, with only about 40% knockdown efficiency.

      (11) In Figure 7D, which samples (which shRNA group) were being compared to do the t-test?

      The comparisons were for shCtrl and each of the shPRMT5. We have clarified this in figure legend.

      (12) In line 240, it is stated that oxphos gene set is essential for NEJF10 cell survival in both normoxia and hypoxia conditions. But shouldn't oxphos be non-essential in hypoxia as cells move away from oxphos and become glycolytic?

      This is a great question. While indeed hypoxia may promote the switch from oxphos to glycolysis, several studies showed that the low oxygen concentrations in hypoxic regions of tumors may not be limiting for oxphos, and ATP is generated by oxphos in tumors even at very low oxygen tensions (please see review Clin Cancer Res (2018) 24 (11): 2482–2490.). We therefore speculated that NEJF10 cells were still dependent on oxphos for ATP production under hypoxia. However, this needs further investigation. We have added this discussion in our manuscript (line 250-254).

      (13) In line 485 it is mentioned that Pmvk and Mvd genes which are involved in cholesterol synthesis when knocked out had a positive effect on cell growth in 3D conditions and since cholesterol synthesis is essential for cell growth how does this not matter much in the context of 3D - please explain.

      We thank this reviewer for this note. It seemed that only two gRNA for each were upregulated in 3D and it could be due to technical issue or clonal selection. We have deleted this sentence in our new version of MS.

      Reviewer #3 (Public review):

      Summary:

      In this study, Fang et al. systematically investigate the effects of culture conditions on gene expression, genome architecture, and gene dependency. To do this, they cultivate the murine HCC line NEJF10 under standard culture conditions (2D), then under similar conditions but under hypoxia (1% oxygen, 2D hypoxia) and under normoxia as spheroids (3D). NEJF10 was isolated from a marine HCC model that relies exclusively on MYC as a driver oncogene. In principle, (1) RNA-seq, (2) ATAC-seq and (3) genetic screens were then performed in this isogenic system and the results were systematically compared in the three cultivation methods. In particular, genome-wide screens with the CRISPR library Brie were performed very carefully. For example, in the 2D conditions, many different time points were harvested to control the selection process kinetically. The authors note differential dependencies for metabolic processes (not surprisingly, hypoxia signaling is affected) such as the regulation and activity of mitochondria, but also organogenesis signaling and epigenetic regulation.

      Strengths:

      The topic is interesting and relevant and the experimental set-up is carefully chosen and meaningful. The paper is well written. While the study does not reveal any major surprises, the results represent an important resource for the scientific community.

      We thank this reviewer for his/her positive comments.

      Weaknesses:

      However, this presupposes that the statistical analysis and processing are carried out very carefully, and this is where my main suggestions for revision begin. Firstly, I cannot find any information on the number of replicates in RNA- and ATAC-seq. This should be clearly stated in the results section and figure legends and cut-offs, statistical procedures, p-values, etc. should be mentioned as well. In principle, all NGS experiments (here ATAC- and RNA-seq) should be performed in replicates (at least duplicates, better triplicates) or the results should be validated by RT-PCR in independent biological triplicates. Secondly, the quantification of the analyses shown in the figures and especially in the legends is not sufficiently careful. Units are often not mentioned. Example Figure 4a: The legend says: 'gRNA reads' but how can the read count be -1? I would guess these are FC, log2FC, or Z-values. All figure legends need careful revision.

      Based upon the reviewer’s suggestions, we have added details about the replicates in figure legend. For gRNA read heatmap, the scale bar indicates the Z score. We have added the information in figure legends.

      Furthermore, I would find a comparison of the sgRNA abundances at the earliest harvesting time with the distribution in the library interesting, to see whether and to what extent selection has already taken place before the three culture conditions were established (minor point).

      This is great point. Unfortunately, we did not perform such an analysis.

      Recommendations for the authors:

      Reviewing Editor:

      There are three general issues:

      First, there is a lack of detail regarding much of the analysis. In some cases, this makes it difficult to assess the value of the data, albeit, there is generally a consensus the information is really interesting.

      Second, the findings - although provocative - lack mechanistic details and are focused more on descriptive findings. Hence, the manuscript would be improved by some effort at evaluating identified programs and providing some suggestions of mechanisms.

      Third, the authors need to put much more effort into the clarity and tightness of the presentation.

      We have made clarification in response to the reviewer’s comments.

      Reviewer #1 (Recommendations for the authors):

      Figure S1C. the labeling of the lower x-axis is inverted.

      Due to space limitation, we changed the figure orientation in our old version of MS. We have tilted the figure back in the new version, which is Figure 1-figure supplement 1B now.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The authors address the role of the centromere histone core in force transduction by the kinetochore.

      Strengths:

      They use a hybrid DNA sequence that combines CDEII and CDEIII as well as Widom 601 so they can make stable histones for biophysical studies (provided by the Widom sequence) and maintain features of the centromere (CDE II and III).

      Weaknesses:

      The main results are shown in one figure (Figure 2). Indeed the Centromere core of Widom and CDE II and III contribute to strengthening the binding force for the OA-beads. The data are very nicely done and convincingly demonstrate the point. The weakness is that this is the entire paper. It is certainly of interest to investigators in kinetochore biology, but beyond that, the impact is fairly limited in scope.

      This reviewer might have missed that this is a Research Advance, not an article. Research Advances are limited in scope by definition and provide a new development that builds on research reported in a prior paper. They can be of any length. Our Research Advance builds on our prior work, Hamilton et al., 2020 and provides the new result that native centromere sequences strengthen the attachment of the kinetochore to the nucleosome.

      Reviewer #2:

      Summary:

      This paper provides a valuable addendum to the findings described in Hamilton et al. 2020 (https://doi.org/10.7554/eLife.56582). In the earlier paper, the authors reconstituted the budding yeast centromeric nucleosome together with parts of the budding yeast kinetochore and tested which elements are required and sufficient for force transmission from microtubules to the nucleosome. Although budding yeast centromeres are defined by specific DNA sequences, this earlier paper did not use centromeric DNA but instead the generic Widom 601 DNA. The reason is that it has so far been impossible to stably reconstitute a budding yeast centromeric nucleosome using centromeric DNA.

      In this new study, the authors now report that they were able to replace part of the Widom 601 DNA with centromeric DNA from chromosome 3. This makes the assay more closely resemble the in vivo situation. Interestingly, the presence of the centromeric DNA fragment makes one type of minimal kinetochore assembly, but not the other, withstand stronger forces.

      We thank the reviewer for their careful and positive assessment of our work.

      Which kinetochore assembly turned out to be affected was somewhat unexpected, and can currently not be reconciled with structural knowledge of the budding yeast centromere/kinetochore. This highlights that, despite recent advances (e.g. Guan et al., 2021; Dendooven et al., 2023), aspects of budding yeast kinetochore architecture and function remain to be understood and that it will be important to dissect the contributions of the centromeric DNA sequence.

      We couldn’t agree more.

      Given the unexpected result, the study would become yet more informative if the authors were able to pinpoint which interactions contribute to the enhanced force resistance in the presence of centromeric DNA.

      Strength:

      The paper demonstrates that centromeric DNA can increase the attachment strength between budding yeast microtubules and centromeric nucleosomes.

      Weakness:

      How centromeric DNA exerts this effect remains unclear.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Additional specific mutants would be helpful in interpreting the effect observed. The authors speculate that a small segment of OA near the DNA (based on Dendooven et al., 2023) could be important. Would it be possible to introduce specific mutations and test this?

      This would be an interesting study but is far beyond the scope of a Research Advance. In fact, it would make a nice thesis project for a new student. Although perhaps not obvious, these studies require a large set of reagents including wrapped nucleosomes, which must be made fresh (they cannot be frozen) and five purified recombinant complexes, purified by specialized protocols that maintain their activity. Moreover, each datapoint is gathered one at a time. For example, the data in Figure 2 in this manuscript includes 343 datapoints acquired one at a time over the course of 1.5 years.  

      (2) Please provide the sequences of the other CEN3-W601 chimeras that were tested and did NOT stably wrap centromeric histone octamers. This may help others to design yet different constructs in the future. (Maybe the information is there and I didn't see it?)

      We fully agree and thank the reviewer for this excellent suggestion. The sequences and summaries of their wrapping stability are now provided in Table 3, page 17.

      (3) I wonder whether the authors tested the C0N3 sequence used in Dendooven et al., 2023. If not, could it be tested? This would more tightly couple the functional assay shown here with the structural work.

      We did not test the CON3 sequence, which was published several years after the start of this work. We agree that a tight coupling between the functional assay and the structural work would be useful. However, we also see the advantage of being able to go beyond the structural work and include even more CEN3 sequence than has so far been possible in the structural work.  

      In addition to measuring the role of DNA sequence in Okp1/Ame1 attachment to the nucleosome, we were interested in the role of DNA sequence in the attachment of Mif2. Therefore, we included all 35 bp of the Mif2 footprint in our chimeric CCEN DNA sequence. CON3 only includes 8 bp from CDEII. We did produce stable nucleosomes using CEN3-601 from Guan et al. (see Table 3). Again, CEN3-601 only includes 8 bp of the Mif2 footprint so we opted to study nucleosomes wrapped in our CCEN DNA with the entire Mif2 footprint. Curiously we found that even the entire Mif2 footprint was not enough to find the DNA sequence specificity seen in the EMSA experiments reported by Xiao et al., 2017.

      To help readers understand the differences between all these constructs, we have included them in Table 3.

      (4) Would an AlphaFold 3 prediction of the assemblies used in this paper be feasible and useful?

      The structures of the Dam1 complex (Jenni et al., 2018), Ndc80 complex (Zahm, et al., 2023 and references therein), MIND complex (Dimitrova et al., 2016), OA complex (Dendooven et al., 2023), and the nucleosome (Xaio et al., 2017; Yan et al., 2019; Guan et al., 2021; Dendooven et al., 2023) are published. The interactions between many of these complexes are understood beyond the level that AlphaFold3 could provide (Dimitrova et al., 2016; Dendooven et al., 2023). One of the main questions is how Mif2 interacts with the nucleosome and the other components of the kinetochore. Even structural analyses that included Mif2 in the assembly detect little or no Mif2 in the final structure. Unfortunately, AlphaFold3 is also not helpful as it predicts only the structure of the dimerization domain, which was already known (Cohen et al., 2008).

      AlphaFold3 predicts the rest of Mif2 is largely unstructured with several alpha helices predicted with low confidence.

      (5) Given that the centromeric DNA piece included should be able to bind the CBF3 complex, would it be possible to add this complex and test the effect on force transmission?

      This would be an interesting experiment, and we do expect CBF3 to bind. As stated above, this is far beyond the scope of this Research Advance. In our experience, with each new kinetochore subcomplex that we add into our reconstitutions, there are new challenges purifying the subcomplex in active form and in sufficient quantity. We are eager to add CBF3 but this is not something we can pull off in the context of this Research Advance. Thank you again for the time and energy spent reviewing our manuscript

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors set out to analyse the roles of the teichoic acids of Streptococcus pneumoniae in supporting the maintenance of the periplasmic region. Previous work has proposed the periplasm to be present in Gram positive bacteria and here advanced electron microscopy approach was used. This also showed a likely role for both wall and lipo-teichoic acids in maintaining the periplasm. Next, the authors use a metabolic labelling approach to analyse the teichoic acids. This is a clear strength as this method cannot be used for most other well studied organisms. The labelling was coupled with super-resolution microscopy to be able to map the teichoic acids at the subcellular level and a series of gel separation experiments to unravel the nature of the teichoic acids and the contribution of genes previously proposed to be required for their display. The manuscript could be an important addition to the field but there are a number of technical issues which somewhat undermine the conclusions drawn at the moment. These are shown below and should be addressed. More minor points are covered in the private Recommendations for Authors.

      Weaknesses to be addressed:

      (1) l. 144 Was there really only one sample that gave this resolution? Biological repeats of all experiments are required.

      CEMOVIS is a very challenging method that is not amenable to numerous repeats. However, multiple images were recorded from at least two independent samples for each strain. Additional sample images are shown in a new Fig. S3.

      CETOVIS is even more challenging (only two publications in Pubmed since 2015) and was performed on a single ultrathin section that, exceptionally, laid perfectly flat on the EM grid, allowing tomography data acquisition on ∆tacL cells. The reconstructed tomogram confirmed the absence of a granular layer in the depth of the section. Additionally, the numbering of Fig. S4A-B (previously misidentified as Fig. S2A-B) has been corrected in the text of V2.

      (2) Fig. 4A. Is the pellet recovered at "low" speeds not just some of the membrane that would sediment at this speed with or without LTA? Can a control be done using an integral membrane protein and Western Blot? Using the tacL mutant would show the behaviour of membranes alone.

      We think that the pellet is not just some of the membrane but most of it. In support of this view, the “low” speed pellets after enzymatic cell lysis contain not just some membrane lipids, but most of them (Fig. S10A). We therefore expect membrane proteins to be also present in this fraction. We performed a Western blot using antibodies against the membrane protein PBP2x (new Fig. S7C). Unfortunately, no signal was detected most likely due to protein degradation from contaminant proteases that we could trace to the purchased mutanolysin. The same sedimentation properties were observed with the ∆tacL strain as shown in Fig. 6A. However, in the ∆tacL strain the membrane pellet still contains membrane-bound TA precursors. It is therefore impossible to test definitely if pneumococcal membranes totally devoid of TA would sediment in the same way.

      (3) Fig. 4A. Using enzymatic digestion of the cell wall and then sedimentation will allow cell wall associated proteins (and other material) to become bound to the membranes and potentially effect sedimentation properties. This is what is in fact suggested by the authors (l. 1000, Fig. S6). In order to determine if the sedimentation properties observed are due to an artefact of the lysis conditions a physical breakage of the cells, using a French Press, should be carried out and then membranes purified by differential centrifugation. This is a standard, and well-established method (low-speed to remove debris and high-speed to sediment membranes) that has been used for S. pneumoniae over many years but would seem counter to the results in the current manuscript (for instance Hakenbeck, R. and Kohiyama, M. (1982), Purification of Penicillin-Binding Protein 3 from Streptococcus pneumoniae. European Journal of Biochemistry, 127: 231-236).

      Thank you for this suggestion. We have tested this hypothesis by breaking cells with a Microfluidizer followed by differential centrifugation. This experiment, which requires an important minimal volume, was performed with unlabeled cells (due to the cost of reagents) and assessed by Western blot using antibodies against the membrane protein PBP2x (new Fig. S7C). In this case, the majority of the membrane material was found in the high-speed pellet, as expected.

      We also applied the spheroplast lysis procedure of Flores-Kim et al. to the labeled cells, and found that most of the labeled material sedimented at low speed (new Fig. S7B), as observed with our own procedure.

      With these new results, the section on membrane density has been removed from the Supplementary Information. Instead, the fractionation is further discussed in terms of size of membrane fragments and presence of intact spheroplasts in the notes in Supplementary Information preceding Fig. S7.

      (4) l. 303-305. The authors suggest that the observed LTA-like bands disappear in a pulse chase experiment (Fig. 6B). What is the difference between this and Fig. 5B, where the bands do not disappear? Fig. 5C is the WT and was only pulse labelled for 5 min and so would one not expect the LTA-like bands to disappear as in 6B?

      Fig. 6B shows a pulse-chase experiment with strain ∆tacL, whereas Fig. 5C shows a similar experiment with the parental WT strain. The disappearance of the LTA-like band pattern with the ∆tacL strain (Fig. 6B), and their persistence in the WT strain (Fig. 5C), indicate that these bands are the undecaprenyl-linked TA in ∆tacL and proper LTA in the WT. A sentence has been added to better explain this point in V2.

      Note that we have exchanged the previous Fig. 5C and Fig. S13B, so that the experiments of Fig. 5A and 5C are in the same medium, as suggested by Reviewer #2.

      (5) Fig. 6B, l. 243-269 and l. 398-410. If, as stated, most of the LTA-like bands are actually precursor then how can the quantification of LTA stand as stated in the text? The "Titration of Cellular TA" section should be re-evaluated or removed? If you compare Fig. 6C WT extract incubated at RT and 110oC it seems like a large decrease in amount of material at the higher temperature. Thus, the WT has a lot of precursors in the membrane? This needs to be quantified.

      Indeed, the quantification of the ratio of LTA and WTA in the WT strain rests on the assumption that the amount of membrane-linked polymerized TA precursors is negligible in this strain. This assumption is now stated in the Titration section. We think it is the case. The true LTA and TA precursors do not have exactly the same electrophoretic mobility, being shifted relative to each other by about half a ladder “step”. This difference is visible when samples are run in adjacent lanes on the same gel, as in the new Fig. 6C. The difference of migration was well documented in the original paper about the deletion of tacL, although tacL was known as rafX at that time, and the ladders were misidentified as WTA (Wu et al. 2014. A novel protein, RafX, is important for common cell wall polysaccharide biosynthesis in Streptococcus pneumoniae: implications for bacterial virulence. J Bacteriol. 196, 3324-34. doi: 10.1128/JB.01696-14). This reference was added in V2. The experiment in the new Fig. 6C was repeated to have all samples on the same gel and treated at a lower temperature. The minor effect on the amount of LTA when WT cells are heated at pH 4.2 may be due to the removal of some labeled phosphocholine. We have NMR evidence that the phosphocholine in position D is labile to acidic treatment of LTA, which may lack in some cases, as reported by Hess et al. (Nat Commun. 2017 Dec 12;8(1):2093. doi: 10.1038/s41467-017-01720-z).

      (6) L. 339-351, Fig. 6A. A single lane on a gel is not very convincing as to the role of LytR. Here, and throughout the manuscript, wherever statements concerning levels of material are made, quantification needs to be done over appropriate numbers of repeats and with densitometry data shown in SI.

      Yes indeed. Apart from the titration of TA in the WT strain, we haven’t yet carried out a thorough quantification of TA or LTA/WTA ratio in different strains and conditions, although we intend to do so in a follow-up study, using the novel opportunities offered by the method presented here.

      However, to better substantiate our statement regarding the ∆lytR strain, we have quantified two experiments performed in C-medium with azido-choline, and two experiments of pulse labeling in BHI medium. The results are presented in the additional supplementary Fig. S14. The value of 51% was a calculation error, and was corrected to 41%. Likewise, the decrease in the WTA/LTA ratio was corrected to 5 to 7-fold.

      (7) 14. l. 385-391. Contrary to the statement in the text, the zwitterionic TA will have associated counterions that result in net neutrality. It will just have both -ve and +ve counterions in equal amounts (dependent on their valency), which doesn't matter if it is doing the job of balancing osmolarity (rather than charge).

      Thank you for pointing out this point. The paragraph has been corrected in V2.

      Reviewer #2 (Public review):

      The Gram-positive cell wall contains for a large part of TAs, and is essential for most bacteria. However, TA biosynthesis and regulation is highly understudied because of the difficulties in working with these molecules. This study closes some of our important knowledge gaps related to this and provides new and improved methods to study TAs. It also shows an interesting role for TAs in maintaining a 'periplasmic space' in Gram positives. Overall, this is an important piece of work. It would have been more satisfying if the possible causal link between TAs and periplasmic space would have been more deeply investigated with complemented mutants and CEMOVIS. For the moment, there is clearly something happening but it is not clear if this only happens in TA mutants or also in strains with capsules/without capsules and in PG mutants, or in lafB (essential for production of another glycolipid) mutants. Finally, some very strong statements are made suggesting several papers in the literature are incorrect, without actually providing any substantiation/evidence supporting these claims. Nevertheless, I support the publication of this work as it pioneers some new methods that will definitively move the field forward.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) l. 55 It is stated that TA are generally not essential. This needs to be introduced in a little more detail as in several species they are collectively. Need some more references here to give context.

      We have expended the paragraph and added a selection of references in V2.

      (2) l. 63 and Fig. 1A. Is the model based on the images from this paper? Is the periplasm as thick as the peptidoglycan layer? Would you not expect the density of WTA to be the same throughout the wall, rather than less inside? Do the authors think that the TA are present as rods in the cell envelope and because of this the periplasm looks a little like a bilayer, is this so? Is the relative thickness of the layers based on the data in the paper (Table 1)?

      The model proposed in Fig. 1A is not based on our data. It is a representation of the model proposed by Harold Erickson, and the appropriate reference has been added to the figure legend in V2. We do not speculate on the relative density of WTA inside the peptidoglycan layer, at the surface or in the periplasm. The only constraint from the model is that the density of WTA in the periplasm should be sufficient for self-exclusion and allow the brush polymer theory to apply. The legend has been amended in V2.

      We indeed think that the bilayer appearance of the periplasmic space in the wild type strain, and the single layer periplasmic space in the ∆tacL and ∆lytR support the Erickson’s model. Although the model was drawn arbitrarily, it turns out that the relative thickness of the peptidoglycan and periplasmic scale is in rough agreement with the measurements reported in Table 1.

      (3) Fig. 2. It is hard to orient oneself to see the layers. The use of the term periplasmic space (l. 132) and throughout is probably not wise as it is not a space.

      We prefer to retain this nomenclature since the term periplasmic space has been used in all the cell envelope CEMOVIS publications and is at the core of Erickson’s hypothesis about these observations and teichoic acids.

      (4) L. 147. This is not referring to Fig. S2A-B as suggested but Fig. S3A-B.

      This has been corrected.

      (5) l. 148. How do you know the densities observed are due to PG or certainly PG alone? Perhaps it is better to call this the cell wall.

      Yes. Cell wall is a better nomenclature and the text and Table 1 have been corrected in V2, in accordance with Fig. 2.

      (6) l. 165. It is also worth noting that peripheral cell wall synthesis also happens at the same site so this may well not be just division.

      Yes. We have replaced “division site” by “mid-cell” in V2.

      (7) l. 214 What is the debris? If PG digestion has been successful then there will be marginal debris. Is this pellet translucent (like membranes)? If you use fluorescently labelled PG in the preparation has it all disappeared, as would be expected by fully digested and solubilised material?

      In traditional protocols of bacterial membrane preparation, a low-speed centrifugation is first performed to discard “debris” that to our knowledge have not been well characterized but are thought to consist of unbroken cells and large fragments of cell wall. After enzymatic degradation of the pneumococcal cell wall, the low-speed pellet is not translucent as in typical membrane pellets after ultracentrifugation, but is rather loose, unlike a dense pellet of unbroken cells. A description of the pellet appearance was added in V2.

      It is a good idea to check if some labeled PG is also pelleted at low-speed after digestion. In a double labeling experiment using azido-choline and a novel unpublished metabolic probe of the PG, we found that the PG was fully digested and labeled fragments migrated as a couple of fuzzy bands likely corresponding to different labeled peptides. These species were not pelleted at low speed.

      (8) l. 219. Can you give a reference to certify that the low mobility material is WTA? Why does it migrate differently than LTA? Or is the PG digestion not efficient?

      WTA released from sacculi by alkaline lysis were found to migrate as a smear at the top of native gels revealed by alcian-blue silver staining, which is incompatible with SDS (Flores-Kim, 2019, 2022). The references have be added in V2. It could be argued in this case that the smearing was due to partial degradation of the WTA by the alkaline treatment.

      Bui et al. (2012) reported the preparation of WTA by enzymatic digestion of sacculi, but the resulting WTA were without muropeptide, presumably due to a step of boiling at pH 5 used to deactivate the enzymes.

      To our knowledge, this is the first report of pneumococcal WTA prepared by digestion of sacculi and analyzed by SDS-PAGE. Since the migration of WTA in native and SDS-PAGE is similar, we hypothesize that they do not interact significantly with the dodecyl sulphate, in contrast to the LTA, which bear a lipidic moiety. The fuzziness of the WTA migration pattern may also result from the greater heterogeneity due to the attached muropeptide, such as different lengths (di-, tetra-saccharide…), different peptides despite the action of LytA (tri-, tetra-peptide…), different O-acetylation status, etc.

      (9) L. 226-227, Fig S8. Presumably several of the major bands on the Coomassie stained gel are the lysozyme, mutanolysin, recombinant LytA, DNase and RNase used to digest the cell wall etc.? Can the sizes of these proteins be marked on the gel. Do any of them come down with the material at low-speed centrifugation?

      We have provided a gel showing the different enzymes individually and mixed (new Fig. S9G). While performing several experiments of this type, we found that the mutanolysin might be contaminated with proteases. The enzymes do not appear to sediment at low speed.

      (10) Fig. S9B. It is difficult to interpret what is in the image as there appear to be 2 populations of material (grey and sometimes more raised). Does the 20,000 g material look the same?

      Fig. S10B is a 20,000 × g pellet. We agree that there appears to be two types of membrane vesicles, but we do not know their nature.

      (11) l. 277 and Fig. 5A. Why is it "remarkable" that there are apparently more longer LTA molecules as the cell reach stationary phase?

      This is the first time that a change of TA length is documented. Such a change could conceivably have consequences in the binding and activity of CBPs and the physiology of the cell envelope in general. These questions should be adressed in future studies.

      (12) l. 280. How do you know which is the 6-repeat unit?

      It is an assumption based on previous analyses by Gisch et al.( J Biol Chem 2013, 288(22):15654-67. doi: 10.1074/jbc.M112.446963). The reference was added.

      (13) Fig. 5A and C. Panel C, the cells were grown in a different medium and so are not comparable to Panel A. Why is Fig. S12B not substituted for 5B? Presumably these are exponential phase cells.

      We have interverted the Fig. S13B and 5C in V2, as suggested, and changed the text and legends accordingly.

      Reviewer #2 (Recommendations for the authors):

      L30: vitreous sections?

      Corrected in V2.

      L32: as their main universal function --> as a universal function. To show it's the main universal function, you will need to look at this across various bacterial species.

      Changed to “possible universal function” in V2.

      L35: enabled the titration the actual --> titration of the actual?

      Corrected in V2.

      L34: consider breaking up this very long sentence.

      Done in V2.

      L37: may compensate the absence--> may compensate for the absence.

      Corrected in V2.

      L45: Using metabolic labeling and electrophoresis showed --> Metabolic labeling and...

      Corrected in V2.

      L46: This finding casts doubts on previous results, since most LTA were likely unknowingly discarded in these studies. This needs to be rephrased and is unnecessarily callous. While the current work casts doubts on any quantitative assessments of actual LTA levels measured in previous studies, it does not mean any qualitative assessments or conclusions drawn from these experiments are wrong. Better would be to say: These findings suggest that previously reported quantitative assessments of LTA levels are likely underestimating actual LTA levels, since much of the LTA would have been unknowingly discarded.

      If the authors do think that actual conclusions are wrong in previous work, then they need to be more explicit and explain why they were wrong.

      Yes indeed. The statement was toned down in V2.

      L55: Although generally non-essential. I would remove or rephrase this statement. I don't think any TA mutant will survive out in the wild and will be essential under a certain condition. So perhaps not essential for growth under ideal conditions, but for the rest pretty essential.

      The paragraph was amended by qualifying the essentiality to laboratory conditions and including selected references.

      L95: Note that the prevailing model until reference 20 (Gibson and Veening) was that the TA is polymerized intracellularly (see e.g. Figure 2 of PMID: 22432701, DOI: 10.1089/mdr.2012.0026). This intracellular polymerisation model seemed unlikely according to Gibson and Veening ('As TarP is classified by PFAM as a Wzy-type polymerase with predicted active site outside the cell, we speculate that TarP and TarQ polymerize the TA extracellularly in contrast to previous reports.'), but there is no experimental evidence as far as this referee knows of either model being correct.

      Despite the lack of experimental evidence, we think that Gibson and Veening are very likely correct, based on their argument, and also by analogy with the synthesis of other surface polysaccharides from undecaprenyl- or dolichol-linked precursors. It is unfortunate that Figure 2 of PMID: 22432701, DOI: 10.1089/mdr.2012.0026 was published in this way, since there was no evidence for a cytoplasmic polymerization, to our knowledge.

      L97: It is commonly believed, although I'm not sure it has ever been shown, that the capsule is covalently attached at the same position on the PG as WTA. Therefore, there must be some sort of regulation/competition between capsule biosynthesis and WTA biosynthesis (see also ref. 21). The presence of the capsule might thus also influence the characteristics of the periplasmic space. Considering that by far most pneumococcal strains are encapsulated, the authors should discuss this and why a capsule mutant was used in this study and how translatable their study using a capsule mutant is to S. pneumoniae in general.

      A paragraph was added in the Introduction of V2 to present the complication and a sentence was added at the end of the discussion to mention that this should be studied in the future.

      L102: Ref 29 should probably be cited here as well?

      Since in Ref 29 (Flores-Kim et al. 2019) there is a detectable amount of LTA (presumably precursors TA) in the ∆tacL stain, we prefer to cite only Hess et al. 2017 regarding the absence of LTA in the absence of TacL. However, we added in V2 a reference to Flores-Kim et al. 2019 in the following paragraph regarding the role of the LTA/WTA ratio.

      L106: dependent on the presence of the phosphotransferase LytR (21). --> dependent on the presence of the phosphotransferase LytR, whose expression is upregulated during competence (21).

      Corrected in V2.

      L119: I fail to see how the conclusions drawn by other groups (I assume the authors mean work from the Vollmer, Rudner, Bernhardt, Hammerschmidt, Havarstein, Veening groups?) are invalid if they compared WTA:LTA ratios between strains and conditions if they underestimated the LTA levels? Supposedly, the LTA levels were underestimated in all samples equally so the relative WTA/LTA ratio changes will qualitatively give the same outcome? I agree that these findings will allow for a reassessment of previous studies in which presumably too low LTA levels were reported, but I would not expect a difference in outcome when people compared WTA:LTA ratios between strains?

      The sentence was rephrased in V2 to be neutral regarding previous work and rather emphasize future possibilities.

      L131: Perhaps it would be good to highlight that such a conspicuous space has been noticed before by other EM methods (see e.g. Figs.4 and 5 or ref 19, or one of the most clear TEM S. pneumoniae images I have seen in Fig. 1F of Gallay et al, Nat. Micro 2021). However, always some sort of staining had previously been performed so it was never clear this was a real periplasmic space. CEMOVIS has this big advantage of being label free and imaging cells in their presumed native state.

      Thanks for pointing out these beautiful data that we had overlooked. We have added a few sentences and references in the Discussion of V2.

      L201: References are not numbered.

      Corrected in V2.

      L271/L892: Change section title. 'Evolution' can have multiple meanings. It would be more clear to write something like 'Increased TA chain length in stationary phase cells' or something like that.

      Changed in V2.

      L275: harvested

      Corrected in V2.

      L329: add, as suggested shown previously (I guess refs 24 and 29)

      Reference to Hess et al. 2017 has been added in V2. A sentence and further references to Flores-Kim, 2019, 2022 and Wu et al. 2014 were added at the end of the discussion with respect to the LTA-like signal observed in these studies of ∆tacL strains.

      L337: I think a concluding sentence is warranted here. These experiments demonstrate that membrane-bound TA precursors accumulate on the outside of the membrane, and are likely polymerized on the outside as well, in line with the model proposed in ref. 20.

      From the point of view of formal logic, the accumulation of membrane-bound TA precursors on the outer face of the membrane does not prove that they were assembled there. They could still be polymerized inside and translocated immediately. However, since this is extremely unlikely for the reasons discussed by Gibson and Veening, we have added a mild conclusion sentence and the reference in V2.

      L343: How accurate are these quantifications? Just by looking at the gel, it seems there is much less WTA in the lytR mutant than 50% of the wild type?

      Yes, the 51% value was a calculation error. This was changed to 41%. Likewise, the decrease of the WTA amount relative to LTA was corrected to 5- to 7-fold.

      Apart from the titration of TA in the WT strain, we haven’t yet carried out a careful quantification neither of TA nor of the LTA/WTA ratio in different strains and conditions, although we intend to do so in the near future using the method presented here.

      However, to better substantiate our statement regarding the ∆lytR strain, we have quantified two experiments of growth in C-medium with azido-choline, and two experiments of pulse labeling in BHI medium. The results are presented in the additional supplementary Fig. S14.

      L342: although WTA are less abundant and LTA appear to be longer (Fig. 6A). although WTA are less abundant and LTA appear to be longer (Fig. 6A), in line with a previous report showing that LytR the major enzyme mediating the final step in WTA formation (ref. 21). (or something like that). Perhaps better is to start this paragraph differently. For instance: Previous work showed that LytR is the major enzyme mediating the final step in WTA formation (ref. 21). As shown in Fig. 6A, the proportion of WTA significantly decreased in the lytR mutant. However, there was still significant WTA present indicating that perhaps another LCP protein can also produce WTA.

      Changed in V2.

      Of note, WTA levels would be a lot lower in encapsulated strains as used in Ref. 21 (assuming WTA and capsule compete for the same linkage on PG). So perhaps it would be hard to detect any residual WTA in a encapsulated lytR mutant?

      Investigation of the relationship between TA and capsule incorporation or O-acetylation is definitely a future area of study using this method of TA monitoring.

      L371: see my comments related to L131. Some TEM images clearly show the presence of a periplasmic space.

      Comments and references have been added in V2.

      L402: It would be really interesting to perform these experiments on a wild type encapsulated strain. Would these have much more LTA? (I understand you cannot do these experiments perhaps due to biosafety, but it might be interesting to discuss).

      Yes. It would be interesting to compare the TA in D39 and D39 ∆cps strains. We have added this perspective at the end of the discussion in V2.

      L418: ref lacks number

      Corrected in V2.

      L423: refs missing.

      References added in V2.

      L487: See my comments regarding L46. I do not see one valid point in the current paper why underestimating LTA levels would change any of the conclusions drawn in Ref. 21. I do not know the other papers cited well enough, but it seems highly unlikely that their conclusions would be wrong by systematically underestimating LTA levels. As far as I understand it, this current work basically confirms the major conclusions drawn by these 'doubtful' papers (that TacL makes LTA and LytR is the main WTA producer). As such, I find this sentence highly unfair without precisely specifying what the exact doubts are. Sure, this current paper now shows that probably people have discarded unknowingly LTA and therefore underestimated LTA levels, so any quantitative assessment of LTA levels are probably wrong. That is one thing. But to say this casts doubts on these studies is very serious and unfair (unless the authors provide good arguments to support these serious claims).

      Yes indeed. The sentence was rephrased to be strictly factual in V2.

      Table 2: I assume these strains are delta cps? Would be relevant to list this genotype.

      The Table 2 was completed in V2.

      The authors should comment on why the mutants have not been complemented, especially for lytR as it's the last gene in a complex operon. It would be great to see WTA levels being restored by ectopic expression of LytR.

      Yes. We think this could be part of an in-depth study of the attachment of WTA, together with the investigation of the other LCP phosphotransferases.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      Summary:

      The behavioral switch between foraging and mating is important for resource allocation in insects. This study characterizes the role of sulfakinin and the sulfakinin receptor 1 in changes in olfactory responses associated with foraging versus mating behavior in the oriental fruit fly (Bactrocera dorsalis), a significant agricultural pest. This pathway regulates food consumption and mating receptivity in other species; here the authors use genetic disruption of sulfakinin and sulfakinin receptor 1 to provide strong evidence that changes in sulfakinin signaling modulate antennal responses to food versus pheromonal cues and alter the expression of ORs that detect relevant stimuli.

      Strengths:

      The authors utilize multiple complementary approaches including CRISPR/Cas9 mutagenesis, behavioral characterization, electroantennograms, RNA sequencing and heterologous expression to convincingly demonstrate the involvement of the sulfakinin pathway in the switch between foraging and mating behaviors. The use of both sulfakinin peptide and receptor mutants is a strength of the study and implicates specific signaling actors.

      Weaknesses:

      The authors demonstrate that SKR is expressed in olfactory neurons, however there are additional potential sites of action that may contribute to these results.

      Recommendations for the authors:

      The authors have addressed most of the issues raised by the reviewers. Below are a few outstanding issues.

      (1) Lines 68-69 describe "control of B. dorsalis include the use of the behavioral responses to semiochemicals" but does not describe what these responses are or how behavior is modulated.

      The sentence was revised as “Control of B. dorsalis include the use of the reproductive and feeding behavioral responses to semiochemicals” (lines 69 in the revision).

      (2) Statistical analysis for 9 hour starved females at 5 minutes is missing in Figure 1D and S1.

      We had added statistical analysis for 9 hour starved females at 5 minutes in the revised Figures 1D and S1, respectively (lines 578).

      (3) The legend in Figure S2 should be revised as it is not clear from the figure which of the odors are food associated odors.

      As suggested, we added food odor label in the revised Figure S2 (lines 666).

      (4) Line 167: "Therefore, the upregulated OR genes in starved WT flies, OR7a.4, OR7a.8 and OR10a, were activated by the pheromonal components, while down regulated genes, OR49a and OR63a, were activated by food volatiles." Based on the data, this sentence is incorrect - Therefore, the upregulated OR genes in starved WT flies, OR7a.4, OR7a.8 and OR10a, were activated by the food components, whereas downregulated genes, OR49a and OR63a, were activated by pheromonal components."

      We are sorry for our mistake. We had corrected it (lines 168-169).

      (5) Line 192: "The coordinated action of sulfakinin on mutiple downstreams,..." should be revised to "downstream pathways or tissues" or simply removing "multiple downstream".

      As suggested, we removed “multiple downstream”. See line 192.

      (6) Reference formatting is inconsistent: see line 207 vs line 208.

      We had corrected it as “(Wu et al., 2019)” (lines 207). 

      (7) Lines 241-244 The broad discussion regarding the evolution and ancestral function of CCK here and the phylogeny in Figure S6 are peripheral to the authors claims.

      As suggested, we removed the section and the Figure S6 in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This research article by Nath et al. from the Lee Lab addresses how lipolysis under starvation is achieved by a transient receptor potential channel, TRPγ, in the neuroendocrine neurons to help animals survive prolonged starvation. Through a series of genetic analyses, the authors identify that TRPγ mutations specifically lead to a failure in lipolytic processes under starvation, thereby reducing animals' starvation resistance. The conclusion was confirmed through total triacylglycerol levels in the animals and lipid droplet staining in the fat bodies. This study highlights the importance of transient receptor potential (TRP) channels in the fly brain to modulate energy homeostasis and combat metabolic stress. While the data is compelling and the message is easy to follow, several aspects require further clarification to improve the interpretation of the research and its visibility in the field.

      Strengths:

      This study identifies the biological meaning of TRPγ in promoting lipolysis during starvation, advancing our knowledge about TRP channels and the neural mechanisms to combat metabolic stress. Furthermore, this study demonstrates the potential of the TRP channel as a target to develop new therapeutic strategies for human metabolic disorders by showing that metformin and AMPK pathways are involved in its function in lipid metabolisms during starvation in Drosophila.

      Weaknesses:

      Some key results that might strengthen their conclusions were left out for discussion or careful explanation (see below). If the authors could improve the writing to address their findings and connect their findings with conclusions, the research would be much more appreciated and have a higher impact in the field.

      Here, I listed the major issues and suggestions for the authors to improve their manuscript:

      (1) Are the increased lipid droplet size and the upregulated total TAG level measured in the starved or sated mutant in Figure 1? This information might be crucial for readers to understand the physiological function of TRP in lipid metabolism. In other words, clarifying whether the upregulated lipid storage is observed only in the starved trp mutant will advance our knowledge of TRPγ. If the increase of total TAG level is only observed in the starved animals, TRP in the Dh44 neurons might serve as a sensor for the starvation state required to promote lipolysis in starvation conditions. On the other hand, if the total TAG level increases in both starved and sated animals, activation of Dh44 through TRPγ might be involved in the lipid metabolism process after food ingestion.

      We measured total TAG level in Figure 1 and LD sizes in Figure 2 under sated condition. We inserted “under sated condition” to clarify it. lines 97 and 147-148.

      Thanks for your suggestions.

      (2) It is unclear how AMPK activation in Dh44 neurons reduces the total triacylglycerol (TAG) levels in the animals (Figure 3G). As AMPK is activated in response to metabolic stress, the result in Figure 3G might suggest that Dh44 neurons sense metabolic stress through AMPK activation to promote lipolysis in other tissues. Do Dh44 neurons become more active during starvation? Is activation of Dh44 neurons sufficient to activate AMPK in the Dh44 neurons without starvation? Is activation of AMPK in the Dh44 neurons required for Dh44 release and lipolysis during starvation? These answers would provide more insights into the conclusion in Lines 192-193.

      In our previous study, we demonstrated that trpγ mutants exhibited lower levels of glucose, trehalose and glycogen level (Dhakal et al. 2022), and in the current study, we observed excessive lipid storage in the trpγ mutant, indicating imbalanced energy homeostasis. Given the established role of AMPK in maintaining energy balance (Marzano et. al., 2021, Lin et al 2021), we employed the activated form of AMPK (UAS-AMPK<sup>TD</sup>) in our experiments. Our result showed that expression of activated AMPK in Dh44 neurons led to a reduction in total TAG levels, suggesting that AMPK activation in these neurons can promote lipolysis even in the absence of starvation. Regarding the activation of Dh44 neurons, Dus et al in 2015 reported that Dh44 cells in the brain are activated by nutritive sugars especially in starvation conditions. In addition, another report showed a role of Dh44 neuron in regulating starvation induced sleep suppression (Oh et. al., 2023) which may imply that these neurons become more active under starved conditions. We did not directly assess whether Dh44 neuron activity increases during starvation or whether AMPK activation in these neurons is required for DH44 release and subsequent lipolysis, our finding support the notion that AMPK activation in Dh44 neuron is sufficient to reduce TAG levels, potentially by metabolic stress response typically observed during starvation. We explained it like the following: “Dh44 neurons regulate starvation-induced sleep suppression (Oh et. al., 2023), which implies that these neurons become more active under starved conditions.” lines 190-191.

      (3) It is unclear how the lipolytic gene brummer is further downregulated in the trpγ mutant during starvation while brummer is upregulated in the control group (Figure 6A). This result implies that the trpγ mutant was able to sense the starvation state but responded abnormally by inhibiting the lipolytic process rather than promoting lipolysis, which makes it more susceptible to starvation (Figure 3B).

      Thanks for your suggestions. We explained it like the following: “The data indicates that the trpg mutant can sense the starvation state but responds abnormally by suppressing lipolysis instead of activating it. This dysregulated lipolytic response likely increases the mutant's vulnerability to starvation, as it cannot effectively mobilize lipid stores for energy during periods of nutrient deprivation.” lines 251-254.

      (4) There is an inconsistency of total TAG levels and the lipid droplet size observed in the Dh44 mutant but not in the Dh44-R2 mutant (Figures 7A and 7F). This inconsistency raises a possibility that the signaling pathway from Dh44 release to its receptor Dh44-R2 only accounts for part of the lipid metabolic process under starvation. Adding discussion to address this inconsistency may be helpful for readers to appreciate the finding.

      Thanks for your suggestion. We included the following in the Discussion: “There is an inconsistency of total TAG levels and the LD size observed in the Dh44 mutant. This inconsistency raises a possibility that the signaling pathway from DH44 release to its receptor DH44R2 only accounts for part of the lipid metabolic process under starvation. While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels. This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than DH44. Alternatively, a DH44 neuropeptide-independent pathway could mediate the lipolysis.” lines 429-436.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the function of trpγ in lipid metabolism was investigated. The authors found that lipid accumulation levels were increased in trpγ mutants and remained high during starvation; the increased TAG levels in trpγ mutants were restored by the expression of active AMPK in DH44 neurons and oral administration of the anti-diabetic drug metformin. Furthermore, oral administration of lipase, TAG, and free fatty acids effectively restored the survival of trpγ mutants under starvation conditions. These results indicate that TRPv plays an important role in the maintenance of systemic lipid levels through the proper expression of lipase. Furthermore, authors have shown that this function is mediated by DH44R2. This study provides an interesting finding in that the neuropeptide DH44 released from the brain regulates lipid metabolism through a brain-gut axis, acting on the receptor DH44R2 presumably expressed in gut cells.

      Strengths:

      Using Drosophila genetics, careful analysis of which cells express trpγ regulates lipid metabolism is performed in this study. The study supports its conclusions from various angles, including not only TAG levels, but also fat droplet staining and survival rate under starved conditions, and oral administration of substances involved in lipid metabolism.

      Weaknesses:

      Lipid metabolism in the gut of DH44R2-expressing cells should be investigated for a better understanding of the mechanism. Fat accumulation in the gut is not mechanistically linked with fat accumulation in the fat body. The function of lipase in the gut (esp. R2 region) should be addressed, e.g. by manipulating gut-lipases such as magro or Lip3 in the gut in the contest of trpγ mutant. Also, it is not clarified which cell types in the gut DH44R2 is expressed. The study also mentioned only in the text that bmm expression in the gut cannot restore lipid droplet enlargement in the fat body, but this result might be presented as a figure.

      We appreciate the reviewer’s insightful suggestions. Unfortunately, due to the unviability of the reagent (UAS-Lip3), we were unable to manipulate gut lipase in trpy mutants as proposed. However, we additionally performed immunostaining to examine the co-expression of trpγ and Dh44R2 in the gut, and our results indicate that both trpγ and Dh44R2 are co-expressed in the R2 region of the gut (Figure 7O and P). Furthermore, we have updated our figures to address the point that bmm expression in the gut does not restore lipid droplet enlargement in the fat body, with the revised version (Figure 5I and J).

      Reviewer #3 (Public Review):

      In this manuscript, the authors demonstrated the significance of the TRPγ channel in regulating internal TAG levels. They found high TAG levels in TRPγ mutant, which was ascribed to a deficit in the lipolysis process due to the downregulation of brummer (bmm). It was notable that the expression of TRPγ in DH44+ PI neurons, but not dILP2+ neurons, in the brain restored the internal TAG levels and that the knockdown of TRPγ in DH44+ PI neurons resulted in an increase in TAG levels. These results suggested a non-cell autonomous effect of Dh44+PI neurons. Additionally, the expression of the TRPγ channel in Dh44 R2-expressing cells restored the internal TAG levels. The authors, however, did not provide an explanation of how TRPγ might function in both presynaptic and postsynaptic cells in the non-cell autonomous manner to regulate the TAG storage. The authors further determined the effect of TRPγ mutation on the size of lipid droplets (LD) and the lifespan and found that TRPγ mutation caused an increase in the size of LD and a decrease in the lifespan, which were reverted by feeding lipase and metformin. These were creative endeavors, I thought. The finding that DH44+ PI neurons have non-cell autonomous functions in regulating bodily metabolism (mainly sugar/lipid) in addition to directing sugar nutrient sensing and consumption is likely correct, but the paper has many loose ends. I would like to see a revision that includes more experiments to tighten up the findings and appropriate interpretations of the results.

      (1) The authors need to provide interpretations or speculations as to how DH44+ PI neurons have non-cell autonomous functions in regulating the internal TAG stores, and how both presynaptic DH44 neurons and postsynaptic DH44 R2 neurons require TRPγ for lipid homeostasis.

      In Discussion, we had mentioned our previous finding. “ We previously proposed that TRPg holds DH44 neurons in a state of afterdepolarization, thus reducing firing rates by inactivating voltage-gated Na+ channels (Dhakal et al., 2022). At the physiological level, this induces the consistent release of DH44 and depletion of DH44 stores, resulting in nutrient utilization and storage malfunctions.”

      We also included the following: “TRPg in DH44 neurons may influence the release of metabolic signals or hormones that act on postsynaptic DH44R2 cells. These postsynaptic cells could, in turn, modulate lipid storage and metabolism in a non-cell autonomous manner. However, the mechanism by which TRPg functions in DH44R2 cells remains unclear. One possible explanation is that TRPg in the gut may be activated by stretch or osmolarity (Akitake et al. 2015).” lines 439-440.

      This interaction between presynaptic and postsynaptic cells may ensure a coordinated response to metabolic changes and maintain lipid homeostasis. Thus, both Dh44-expressing and Dh44-R2-expressing cells are crucial for the proper functioning of TRPγ in regulating internal TAG levels and lipid storage.

      (2) The expression of TRPγ solely in DH44 R2 neurons of TRPγ mutant flies restored the TAG phenotype, suggesting an important function mediated by TRPγ in DH44 R2 neurons. However, the authors did not document the endogenous expression of TRPγ in the DH44R2+ gut cells. This needs to be shown.

      We appreciate the reviewer’s suggestion. To address this, we performed immunostaining to examine the expression of TRPγ in the DH44R2+ gut cells. Our results, as shown in Figure 7 O and P, confirm that TRPγ is co-expressed in the Dh44R2+ cells in the gut. We also found that Dh44R2 is expressed in the brain as well. We documented this part like the following: “Given that Dh44R2 is predominantly expressed in the intestine, we performed immunostaining to examine whether Dh44R2 co-localizes with trpg in gut cells. Our results confirmed that Dh44R2 and trpg are co-expressed in intestinal cells (Figure 7O and P). Additionally, we analyzed Dh44R2 expression in the brain and found that two Dh44R2-expressing cells are co-localized with Dh44-expressing cells in the PI region (Figure 7Q). To further delineate whether Dh44R2-mediated fat utilization is specific to the brain, gut, or fat body, we knocked down Dh44R2<sup>RNAi</sup> using Dh44-GAL4, myo1A-GAL4, and cg-GAL4, respectively (Figure 7–figure supplement 1E). Notably, knockdown of Dh44R2 with Myo1A-GAL4 resulted in elevated TAG levels, indicating that DH44R2 activity in lipid metabolism is specific to the gut.” lines 375-384.

      (3) While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels (Figure 7A). This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than Dh44. Alternatively, a Dh44 neuropeptide-independent pathway could mediate the lipolysis. In either case, an additional result is needed to substantiate either one of the hypotheses.

      The Dh44 mutant flies exhibited normal TAG levels, whereas Dh44R2 mutant flies showed elevated TAG levels. However, when we examined the lipid droplets in the fat body, both Dh44 mutant and Dh44R2 mutant flies displayed larger lipid droplets, indicating a disruption in lipid metabolism. Additionally, we assessed starvation survival time and found that both Dh44 and Dh44R2 mutant flies exhibited reduced survival under starvation conditions compared to controls. Supplementation with lipase (Figure 7–figure supplement 1A), glycerol (Figure 7–figure supplement 1B), hexanoic acid (Figure 7–figure supplement 1C), and mixed TAGs (Figure 7–figure supplement 1D) improved starvation survival time, further supporting that the lipid metabolism pathway was impaired in both mutants. These observations highlight the role of Dh44 in regulating lipolysis. We included related Discussion: “There is an inconsistency of total TAG levels and the LD size observed in the Dh44 mutant. This inconsistency raises a possibility that the signaling pathway from DH44 release to its receptor DH44R2 only accounts for part of the lipid metabolic process under starvation. While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels. This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than DH44. Alternatively, a DH44 neuropeptide-independent pathway could mediate the lipolysis.” lines 429-436.

      (4) While the authors observed an increased area of fat body lipid droplets (LD) in Dh44 mutant flies (Figure 7F), they did not specify the particular region of the fat body chosen for measuring the LD area.

      We have chosen the 2-3 segment in the abdomen for all fat body images, which we already mentioned in Nile red staining in the Method section line 630-631.

      (5) The LD area only accounts for TAG levels in the fat body, whereas TAG can be found in many other body parts, including the R2 area as demonstrated in Figure 5A-D using Nile red staining. As such, measuring the total internal TAG levels would provide a more accurate representation of TAG levels than the average fat body LD area.

      We have measured total internal TAG level in whole body throughout the experiments (Figure 1F, 2C, 2E, 3C, 3G, 4A, 4B, 7A, 7I, and many Supplementary Figures) except bmm expression using GAL4/UAS system. Now we include this new data in Figure 5–figure supplement 1) which is the same conclusion with LD analysis.

      (6) In Figure 5F-I, the authors should perform the similar experiment with Dh44, Dh44R1, and Dh44R2 mutant flies.

      We did the experiments with Dh44, Dh44R1, and Dh44R2 mutant flies and we found that Dh44 and Dh44R2 mutant flies showed reduced starvation survival time than control and which was increased after supplementation of lipase, glycerol, hexanoic acid and TAG (Figure 7– figure supplement 1A–D). lines 361-372.

      (7) The representative image in Figure 6B does not correspond to the GFP quantification results shown in Figure 6C. In trpr1;bmm::GFP flies, the GFP signal appears stronger in starved conditions than in satiated conditions.

      We updated it with new images. We quantified GFP intensity level using image J and found that GFP intensity level was significantly lower in starved condition in trpγ<sup>1</sup>;bmm::GFP flies than sated condition.

      (8) In Figure 6H-I, fat body-specific expression of bmm reversed the increased LD area in TRPγ mutants. The authors also showed that Dh44+PI neuron-specific expression of bmm yielded a similar result. The authors need to provide an interpretation as to how bmm acts in the fat body or DH44 neurons to regulate this.

      We first inserted the following in results: “Furthermore, the expression of bmm in the fat body, as well as Dh44 neurons in the PI region, can promote lipolysis at the systemic level.” lines 276-277.

      Additionally, we discussed it in the Discussion: “Brummer lipase is essential for regulating lipid levels in the insect fat body by mediating lipid mobilization and energy homeostasis. In Nilaparvata lugens, it facilitates triglyceride breakdown (Lu et al., 2018), while studies in Drosophila show that reduced Brummer lipase expression decreases fatty acids and increases diacylglycerol levels, highlighting its role in lipid metabolism (Nazario-Yepiz et al., 2021). Here, we additionally demonstrate that bmm expression in DH44 neurons within the PI region can systemically regulate TAG levels. Cell signaling or energy status in DH44 neurons may contribute to hormonal release that targets organs such as the fat body.” lines 451-459.

      (9) The authors should explain why the DH44 R1 mutant did not represent similar results as the wild type.

      We added “In addition, bmm levels in Dh44R1<sup>Mi</sup> under starved condition did not increase as significantly as in the control. This suggests a unique role of DH44 and its receptors in regulating lipid metabolism and response to nutritional status in Drosophila.” lines 358-360.

      (10) It would be good to have a schematic that represents the working model proposed in this manuscript.

      We updated the schematic model in revised version (Figure 8).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      This paper characterized the function of trpγ in Dh44-expressing PI neurons for lipid metabolism and lipolysis induced by prolonged starvation. The authors applied a series of lipolytic genetic manipulation and lipid/lipid metabolism supplements to rescue the trpγ deficits in lipolysis: the expression of active AMPK in the DH44-expressing PI neurons or brummer, a lipolytic gene, in the trpγ-expressing cells, and oral administration of the anti-diabetic drug metformin, lipase, TAG and free fatty acids. Despite this exhaustive characterization of the defective lipolysis in the trpγ mutants, there remain puzzles in inconsistent defects of Dh44 and DH44R2 in the total TAG levels and in the expression and functions of the receptor in the gut. Clarification of these points and other issues raised by the reviewers should improve the mechanisms of lipid metabolism through Dh44 signalling.

      Reviewer #1 (Recommendations For The Authors):

      (1) It might be worth introducing Dh44 in the introduction section as it is unclear to readers how the authors hypothesized the site-of-action of TRPγ in Dh44 neurons for lipid metabolism after reading the introduction.

      We introduced the following: “We found that TRPg expression in Dh44 neuroendocrine cells in the brain is critical for maintaining normal carbohydrate levels in tissues (Dhakal et al. 2022). Building on this, we hypothesized that TRPg in Dh44 cells also regulates lipid and protein homeostasis.” lines 69-71.

      (2) Providing a summary model in the end to integrate the present findings and their previous publication about TRPγ functions in Drosophila sugar selection would greatly help readers understand and appreciate the general role of TRPγ in balancing energy homeostasis.

      We made a schematic model in Figure 8.

      (3) Swapping the order of Figures 5 and 6 might be a better way to tell the story without logic gaps. The results addressing the mechanisms of metformin and TRPγ in promoting lipolysis under starvation are interrupted by the lipid storage data in the R2 cells in the current Figure 5A-5E. In addition, presenting Figure 5A-5E before or together with Figure 7 will help readers appreciate the expression of Dh44-R2 and its function in regulating lipid metabolism in Figure 7.

      We did.

      (4) It might be misleading to use the word "sated" for the condition of 5-hour mild starvation. The word "mild starvation" or the equivalents might be a better word choice.

      We appreciate the reviewer’s concern. As hemolymph sugar level does not drop down significantly in 5 hr starvation, the previous papers (Dus et al 2015, Dhakal et al 2022) indicated it as sated condition. To use the word consistently, we prefer using “sated” instead of “mild starvation”.

      (5) It is unclear what the white arrows are pointing at in Figures 7O and 7P. Some of those seem to be non-specific signals, so it is hard to connect the figure to the conclusion in Lines 351-353. It would be helpful to add some explanations to help readers interpret Figures 7O and 7P.

      In the previous version, Figure 7O and 7P white arrows represented the expression of Dh44R2 in the SEZ region of the brain and R2 region of the gut. In revised version, to make clear, we performed additional immunostaining for the co-expression of trpγ and Dh44R2 in the gut. We found that trpγ and Dh44R2 co-expressed at the R2 region of the gut specifically (Figure 7O and P). Similarly, we found that two cells of Dh44R2 co-expressed in Dh44 cells in the PI region of the brain (now Figure 7Q). We updated this part. lines 375-380.

      (6) The figure legend for the (G) panel in Figure 2-figure Supplement 1 was mislabeled as (F).

      We corrected it.

      (7) In Line 85, the authors might want to write "… among these mutants, only trpγ mutant displayed reduced carbohydrate levels, suggesting …". Please confirm the information for the sentence. lines 87-88.

      We clarified it.

      Reviewer #2 (Recommendations For The Authors):

      (1) The trpγ[G4] would be difficult for non-Drosophila researchers to understand; it would be better to use trpγ-Gal4.

      We got the mutant line from Dr. Craig Montell who named it. We explained it like the following in the main text: “controlled by GAL4 knocked into the trpg locus (trpg<sup>G4</sup> flies; +)” line 109.

      (2) The arrows in Figures 7O and 7P need to be explained in the figure legends.

      We did.

      Reviewer #3 (Recommendations For The Authors):

      (11) Lines 95-96 should have a reference.

      We did.

      (12) Lines 129-130: It should read "TRPγ expressed in DH44 cells is sufficient for the regulation of lipid levels."

      We changed it as suggested.

      (13) Figure 5E needs to be repeated with more trials.

      We increased the n numbers. Previously (Figure 5E) we included area of 10 LDs from 3 samples, and in revised figure (Figure 6I) we have included 28 LDs from 10 samples.

      (14) Figures 5F-I, bold lines are not too visible and therefore, dotted lines could be used.

      We changed it as suggested.

      (15) Line 356: It is not true that D-trehalose or D-fructose is commonly detected by DH44 neurons. These sugars at concentrations much higher than the physiological concentration range stimulate DH44 neurons (see Dus et al., 2015).

      We removed it.

      (16) Lines 362-363: It should read "Expression of TRPγ in DH44 neurons was necessary and sufficient to regulate the carbohydrate and lipid levels.".

      We changed it.

      (17) Lines 369-370: The authors need to consider removing the possible role of CRF in regulating lipid homeostasis. It could be considered to be far-fetched.

      We removed it.

      (18) Line 407-408: the sentence "Nevertheless, it is also known that DH44 neurons mediate the influence of dietary amino acids on promoting food intakes in flies (37)" needs to be removed. They used amino acid concentrations that were far greater than the physiological levels observed in the internal milieu of flies. Still, many laboratories cannot reproduce the result of using the high AA concentrations.

      We removed it.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (public review): 

      This manuscript presents SAVEMONEY, a computational tool designed to enhance the utilization of Oxford Nanopore Technologies (ONT) long-read sequencing for the design and analysis of plasmid sequencing experiments. In the past few years, with the improvement in both sequencing length and accuracy, ONT sequencing is being rapidly extended to almost all omics analyses which are dominated by short-read sequencing (e.g., Illumina). However, relatively higher sequencing errors of long-read sequencing techniques including PacBio and ONT is still a major obstacle for plasmid/clone-based sequencing service that aims to achieve single base/nucleotide accuracy. This work provides a guideline for sequencing multiple plasmids together using the same ONT run without molecular barcoding, followed by data deconvolution. The whole algorithm framework is well-designed, and some real data and simulation data are utilized to support the conclusions. The tool SAVEMONEY is proposed to target users who have their own ONT sequencers and perform library preparation and sequencing by themselves, rather than relying on commercial services. As we know and discussed by the authors, in the real world, to ensure accuracy, the researchers will routinely pick up multiple colonies in the same plasmid construction and submit for Sanger sequencing. However, SAVEMONEY is not able to support the simultaneous analysis of multiple colonies in the same run, as compared to the barcoding-based approaches. This is a major limitation in the significance of this work. Encouraging computational ePorts in ONT data debarcoding for mixed-plasmid or even single-cell sequencing would be more valuable in the field. 

      We thank the reviewer for the positive response to our manuscript and the helpful comments.

      The tool SAVEMONEY is proposed to target users who have their own ONT sequencers and perform library preparation and sequencing by themselves, rather than relying on commercial services.

      We apologize that we were not clear enough in the manuscript. Our tool is designed for users who rely on commercial services (i.e., those who cannot include a barcode by themselves). However, it can also benefit those performing library preparation, as SAVEMONEY can be applied after standard barcode-based sequencing and de-multiplexing. The combination of standard barcodes with SAVEMONEY would significantly expands the scope of sequencing applications. For example, it would enable sequencing of more plasmid types than the number of available barcodes and, in some cases, it may even eliminate the need for barcode introduction. Because we do not own ONT equipment and because the primary target audience for the SAVEMONEY algorithm are users without ONT equipment, we were not able to conduct experiments using ONT. However, to clarify these possibilities, we added a dedicated paragraph describing these issues (3rd paragraph in the discussion section).

      However, SAVEMONEY is not able to support the simultaneous analysis of multiple colonies in the same run, as compared to the barcoding-based approaches.

      We agree with the reviewer about this limitation of SAVEMONEY, as it does not allow mixing of plasmids from multiple colonies in the same cloning run. However, that does not necessarily mean that SAVEMONEY cannot reduce sequencing costs in cloning. For example, when sequencing two colonies from each of three diPerent constructs (six plasmids in total), the standard approach would require sequencing costs for six samples. However, with SAVEMONEY, up to three plasmids can be mixed per sample, allowing them to be sequenced as just two samples. As a result, the sequencing cost per plasmid is reduced to one-third. The greatest benefits can be realized when SAVEMONEY is used at the laboratory level or by multiple researchers. To make this point clearer, we have added sentences in the 5th paragraph of the discussion section.

      (1) To provide more comprehensive information for users who care about the cost, the Introduction section should include a cost comparison between Sanger and ONT, with more details, such as diPerent ONT platforms (MinION, PromethION, FlongIe), chemistries (flow cells) and kits. This additional information will be more helpful and informative for the users who have their own sequencers and are the target audience for SAVEMONEY. 

      We thank the reviewer for pointing this out. Since we do not own ONT equipment, we are unable to provide a total cost for using the ONT platform. However, we have included the price per sample (~$15 per plasmid) for the commercial service we have used, as well as the equipment that they employ (V14 chemistry on a PromethION with an R10.4.1 flow cell) and the number of reads obtained per plasmid (~100–1000) in the 4th paragraph of the introduction section.     Though these costs will inevitably change over time, this information should still be helpful for those who own ONT sequencers in estimating the costs.

      (2) In "Overview of the algorithm" (Pages 3-4) under the Results section, instead of stating "However, coverage varies from ~100-1000 and is diPicult to predict because each nanopore flow cell has diPerent properties.", it will be beneficial to provide more detailed information, such as sequencing length, yield/read count per flow cell of diPerent platforms. This information will assist users in designing their own experiments ePectively. 

      We thank the reviewer for the comment. As mentioned in the previous response, we are unable to provide sequencing length, yield/read count per flow cell because we do not own ONT equipment. However, we apologize if it was not clear in "Overview of the algorithm" section that we are discussing the use of results obtained from commercial services, and therefore we need to provide more detailed information about the results from the commercial service. We have now clarified in the sentence pointed out by the reviewr that the numbers are derived from the information provided by commercial sequencing services. In addition, we have also added that typical examples of the result properties, i.e., read length and quality score distribution, can be found in Fig. 2 at the end of the same paragraph.

      (3) While this study optimized and evaluated the tool using a total of 14 plasmids, it may not provide suPicient power to represent the diversity of the plasmid world. Consideration should be given to expanding the dataset to include a broader range of plasmids in future studies to enhance the robustness and generalizability of the tool. 

      We are grateful to the reviewer for their valuable input. It is very reasonable that we had to expect that a larger number of plasmids should be used, even though the main target of SAVEMONEY is those who utilize commercial services. In the previous version of SAVEMONEY, it was not possible to process in a reasonable amount of time if too many plasmids were provided, though the algorithm itself does not have no restrictions based on the number of plasmids. Therefore, we have changed the underlying code to improve the algorithm, making it more than 20 times faster than the previous version (the benchmark time mentioned in the 3rd paragraph of the discussion section was improved to 3.1 minutes from the previous 65 minutes, using the same dataset and the same computer). Additionally, SAVEMONEY is now compatible with multiprocessing. The processing time is expected to decrease approximately inversely proportional to the number of CPU cores used. We have added these updates at the end of the 3rd paragraph in the discussion section.

      (4) If applicable and feasible, including a comparison or benchmark of SAVEMONEY against other similar tools would further strengthen the manuscript. This comparison would allow users to evaluate the advantages and disadvantages of diPerent tools for their specific needs. 

      We thank the reviewer for the suggestion. We have added the benchmark using the similar tool, On-Ramp, with the exact same set of plasmids and FASTQ data used for our benchmark (4th paragraph in the discussion section). Because the machine specifications used in the On-Ramp web server are unknown, a direct comparison is not possible. However, using only laptop-level computational resources, SAVEMONEY was able to process the data 38% faster than On-Ramp. When using mini-PC level computational resources, the processing time was 64% faster than on-RAMP.

      (5) The importance of pre-filtering raw sequencing reads should be emphasized as noisy reads can significantly impact the overall performance of the tool. It is essential to clarify whether any pre-filtering steps were performed in this study, such as filtering based on quality scores, read length, or other relevant factors. 

      We apologize for not being clear. Unfortunately, the commercial sequencing service we used did not provide the information regarding pre-filtering. However, the impact of the quality of pre-filtering based on quality score and read length on the quality of the final results is theoretically minimal in SAVEMONEY. First, during the initial step of the post-analysis, the classification step, short reads compared to the full plasmid length can be excluded based on the user-defined “score_threshold”. Simultaneously, low-quality reads with poor alignment to the plasmid can also be excluded, because “score_threshold” is related to the normalized alignment score. Even if there are low-quality reads that are not excluded at this stage, the ePect can be minimized during the final step of the post-analysis that generates consensus sequences. This is because our Bayesian analysis considers not only the base calling but also the q-scores to determine the consensus. Therefore, we believe the overall impact of pre-filtering on the final results is negligible.

      (6) The statement regarding the number of required reads per plasmid (20-30) and the maximum number of plasmids (up to six) that can be mixed in a single run may become outdated due to the rapid advancements in ONT technology. In the Discussion section, instead of assuming specific numbers, it would be more beneficial to provide information based on the current state of ONT sequencing, such as the number of reads per MinION flow cell that can be produced.

      We thank the reviewer for pointing this out. Because the number of required reads per plasmid depends on the accuracy of each read (i.e., the number of required reads can be reduced if the accuracy increases), we have added the description of these points to the last paragraph of the discussion section.

      Reviewer #2 (public review):  

      The authors developed an algorithm that allows for deconvoluting of plasmid sequences from a mixture of plasmids that have been sequenced by nanopore long read technology. As library preparations and barcoding of individual samples increase sequencing costs, the algorithm bypasses this need and thus decreases time on sample prep and sequencing costs. In the first step, the tool assesses which of the plasmid constructions can be mixed in a single library preparation by calculating a distance matrix between the reference plasmid and the constructions producing sequence clusters. The user is given groups of plasmids, from diPerent clusters, to be pooled together for sequencing. After sequencing, the algorithm deconvolutes the reads by classifying them based on alignments to the reference sequence. A Bayesian analysis approach is used to obtain a consensus sequence and quality scores. 

      Strengths 

      The authors exploit one of the main advantages of long-read sequencing which is to accurately resolve regions of high complexity, as regularly found in plasmids, and developed a tool that can validate plasmid constructions by reducing sequencing costs. Multiple plasmids (up to six) can be analyzed simultaneously in a single library without the need for sample barcoding, also reducing sample preparation time. Although inserts must be diPerent, just 2 bases diPerence would be enough for a correct assignation. It maximizes cost-ePiciency for projects that require large amounts of plasmid constructions and highthroughput validation. 

      We thank the reviewer for the positive response to our manuscript and the helpful comments.

      Weaknesses 

      The method proposed by the authors requires prior knowledge of plasmid sequences (i.e., blueprints or plasmid reference) and is not suitable for small experiments. The plasmid inserts or backbones must be diPerent e.g., multiple colonies from the same plasmid construction ePort cannot be submitted together. 

      As also discussed in the response to reviewer 1, we agree with the reviewer that SAVEMONEY does not allow you the analysis of plasmids from multiple colonies in the same cloning experiment. However, that does not necessarily mean that SAVEMONEY cannot reduce the sequencing cost. For example, when sequencing two colonies from each of three diPerent constructs (six plasmids in total), the standard approach would require sequencing costs for six samples. However, with SAVEMONEY, up to three plasmids can be mixed per sample, allowing them to be sequenced as just two samples. As a result, the sequencing cost per plasmid is reduced to one-third. The greatest benefits can be realized when SAVEMONEY is used at the laboratory level or by multiple researchers. To make this point clearer, we have added sentences in the 5th paragraph of the discussion section.

      The reviewer also expressed concern that SAVEMONEY is not suitable for experiments at a small scale. To put it more precisely, SAVEMONEY cannot be used when the experiment size is minimal, such as in a lab that consistently constructs only a single plasmid at a time. That said, the strength of SAVEMONEY lies in its scalability. Even in labs where plasmid construction is typically limited to one at a time, there may be occasional instances where two or more plasmids are created simultaneously. In such cases, SAVEMONEY can be used to reduce sequencing costs. Moreover, in a typical molecular biology lab where multiple plasmids are constructed every week, SAVEMONEY can be particularly ePective. Given its adaptability and cost-saving potential and widespread use since its initial publication on bioRxiv and on Google Colab, we are confident that SAVEMONEY will continue to be a valuable tool for a wide range of researchers.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors): 

      The manucript assumes all samples are sent out for sequencing at a specific company. This could be generalized for a much broader use since many labs now own nanopore sequencers. In turn, the advantage of reducing hands-on sample prep becomes more evident. 

      We thank the reviewer for pointing this out. We agree that SAVEMONEY can also benefit those performing library preparation. Combination of standard barcodes with SAVEMONEY significantly expands the scope of sequencing applications. For example, it enables sequencing of more plasmid types than the number of available barcodes and, in some cases, may even eliminate the need for the sample prep step to introduce barcode. Because we do not own ONT equipment, we could not conduct experiments using ONT. However, to clarify these possibilities, we added a dedicated paragraph (3rd paragraph in the discussion section).

      The base calling model (high accuracy, super accuracy) used by Plasmidsaurus and tested here should be mentioned.  

      We thank the reviewer for the suggestion. The description about the base calling model (HAC) was added in Materials and Methods section.

      Other modifications to the revised manuscript 

      Beyond changes made in response to reviewer comments above, we have also through our continued use and improvement of SAVEMONEY, made additional changes to the algorithm and therefore to the manuscript. Those changes are outlined below. Improvements in the pre-survey step

      (1) The pre-survey algorithm was reduced to a Zero-One Integer Linear Programming Problem to guarantee the optimal combinations, as previous versions did not ensure an optimal solution. Relatedly, the explanation of the algorithm in the main manuscript was updated.

      (2) The algorithm was modified to ensure that the number of plasmids distributed to each group is balanced. A new feature was also added to allow users to specify the number of groups, which is beneficial when balancing between cost and quality.

      (3) An error was corrected in Fig. 2, where the distance calculation method for the hierarchical clustering step for group formation was Farthest Point Algorithm, which calculates distance between two clusters based on the farthest pair of plasmids. The correct method is the Nearest Point Algorithm. This error was present only in Fig. 2, while other implementations, including source code of SAVEMONEY and Google Colab page, were correct from the beginning. We have corrected the error in Fig. 2.

      Modifications in figures, manuscripts, and other aspects

      (1) Fig. 3 was updated to reflect the update of SAVEMONEY, although it did not show any important diPerences.

      (2) Parameter names were updated as follows:

      “threshold (pre)” -> “distance_threshold”

      “threshold (post)” -> “score_threshold” Added “number_of_groups”

      (3) The order of elements was rearranged in Fig. 4.

      (4) Incorrect calculations were fixed in Fig. 4g, h, and i (old Fig. 4d, h, and l). Related to that, Fig. 4j, k, and l and Table 1 were added, in addition to the explanation in the main manuscript.

      (5) SAVEMONEY was packaged and was released on PyPI to facilitate easy installation and integration by other developers.

      (6) SAVEMONEY was updated and expanded to accommodate linear DNA fragments, such as PCR amplicons and long synthetic DNA. Users can select the topology of DNA by specifying that as an option. A description of this new capability was added at the end of “Overview of the algorithm” section.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1:

      (…) some concerns with interpretations and technical issues make several major conclusions in this manuscript less rigorous, as explained in detail in comments below. In particular, the two major concerns I have: 1) the contradiction between the strong reduction of global translation, with puromycin incorporation gel showing no detectable protein synthesis in cold, and an apparently large fraction of transcripts whose abundance and translation in Fig. 2A are both strongly increased. 2) The fact that no transcripts were examined for dependance on IRE-1/XBP1 for their induction by cold, except for one transcriptional reporter, and some weaknesses (see below) in data showing activation of IRE-1/XBP-1 pathway. The conclusion for induction of UPR by cold via specific activation of IRE-1/XBP-1 pathway, in my opinion, requires additional experiments.

      Relating to the first point, the results of puromycin incorporation and ribosome profiling are not contradictory. The former shows absolute changes in translation, i.e. changes in how much protein the cell is producing, while the latter shows relative changes between the produced proteins, i.e. how the cell prioritizes its protein production. An observed up-regulation in ribosome profiling does not necessarily mean (but could) that the corresponding protein goes up in absolute terms (units produced per time). Instead, it implies that out of the population of all translating ribosomes, a larger fraction is translating (prioritizing) this particular mRNA relative to other mRNAs. The second point is addressed later in the response.

      Major concerns:

      (1) Fig. 1B shows polysomes still present on day 1 of 4ºC exposure, but the gel in Fig. 1C suggests a complete lack of protein synthesis. Why?

      We realized that the selected gel exposure may give the false impression of a complete lack of puromycin incorporation at 4ºC. To avoid confusion, we now show in Figure 1 – figure supplement 1 the original gel image next to its longer exposure. The quantification of puromycin incorporation remains in Fig. 1C (it is based on 3 biological replicates and only one replicate is shown in the corresponding supplement). We hope it is now clear that there is an ongoing puromycin incorporation/translation at 4ºC, albeit much reduced compared with 20ºC.

      What is then the evidence that ribosomal footprints used in much of the paper as evidence of ongoing active translation are from actual translating rather than still bound to transcripts but stationary ribosomes, considering that cooling to 4ºC is often used to 'freeze' protein complexes and prevent separation of their subunits? The authors should explain whether ribosome profiling as a measure of active translation has been evaluated specifically at 4ºC, or test this experimentally.

      While the ribosomal profiling alone might not prove ongoing translation, the residual puromycin incorporation does (see the longer gel exposure in Figure 1 – figure supplement 1). To strengthen this argument, we selected two additional genes (cebp-1 and numr-1) whose ribosomal footprints increase in the cold, and whose GFP-fusions were available from the CGC. Monitoring their expression, we observed the expected increase in the cold (see Figure 2 – figure supplement 3 A-B). The ongoing translation in the cold is also in line with our previous study (Peke et al., 2022), where we observed de novo protein synthesis of other proteins under the same cooling conditions as in this study.

      They should also provide some evidence (like Western blots) of increases in protein levels for at least some of the strongly cold-upregulated transcripts, like lips-11.

      As explained above, we addressed it by additionally examining two strains expressing GFP-fused proteins, whose translation in the cold is predicted to increase according to our ribosomal profiling data. See the new Figure 2 – figure supplement 3 A-B.

      As puromycin incorporation seems to be the one direct measure of global protein synthesis here, it conflicts with much of the translation data, especially considering that quite a large fraction of transcripts have increased both mRNA levels and ribosome footprints, and thus presumably increased translation at 4ºC, in Fig. 2A.

      We hope the above explanations put this concern to rest.

      Also, it is not clear how quantitation in Fig. 1C relates to the gel shown, the quantitation seems to indicate about 50-60% reduction of the signal, while the gel shows no discernable signal.

      A above, see a longer western blot exposure in Figure 1 – figure supplement 1 and note that the quantification is based on three biological replicates.

      (2) It is striking that plips-11::GFP reporter is induced in day 1 of 4ºC exposure, apparently to the extent that is similar to its induction by a large dose of tunicamycin (Fig. 3 supplement),

      We did not intend to compare the extend of induction between cold and tunicamycin treatment. The tunicamycin experiment was meant to confirm that, as suggested by expression data from Shen et al. 2005, lips-11 is upregulated upon UPR activation.

      …but the three IRE-1 dependent UPR transcripts from Shen 2005 list were not induced at all on day 1 (Fig. 4 supplement). Moreover, the accumulation of the misfolded CPL-1 reporter, that was interpreted as evidence that misfolding may be triggering UPR at 4ºC, was only observed on day 1, when the induction of the three IRE-1 targets is absent, but not on day 3, when it is stronger. How does this agree with the conclusion of UPR activation by cold via IRE-1/XBP-1 pathway?

      In the originally submitted supplemental figure, we compared mRNA levels between day 1 animals at 20ºC versus 4ºC. However, as argued later by this reviewer, it may be better to use day 0 animals at 20ºC as the reference (since at 20ºC the animals will continue producing embryos). Thus, we repeated the RT-qPCR analysis with additional time points (and genes relevant to other comments). This analysis, now in Figure 4 – figure supplement 2, shows that these mRNAs (dnj-27, srp-7, and C36B7.6) increased already at day 1 in the cold compared with the reference 20ºC animals on day 0, and their levels increased further on day 3.

      It is true that the authors do note very little overlap between IRE-1/XBP-1-dependent genes induced by different stress conditions, but for most of this paper, they draw parallels between tunicamycin-induced and cold induced IRE-1/XBP-1 activation.

      We carefully re-examined the manuscript to ensure that we do not draw parallels between cold and tunicamycin treatment. The three genes (dnj-27, srp-7, and C36B7.6) were taken from Shen et al. because that study reported lips-11 as an IRE-1-responsive gene, which we realized thanks to the Wormbase annotation of lips-11. Examining the three genes in our expression data, srp-7 (like lips-11) is also upregulated more than 2-fold, while the other two genes go up but less than 2-fold. As mentioned by the reviewer, we note little overlap between the different stress conditions suggesting that the response is context dependent. Additional differences may arise if, as we hypothesize, UPR is activated in the cold in response to both protein and lipid stress. Note that the 2-fold cutoff used in the previous Figure 7 – figure supplement 1 was (erroneously) on the log2 scale, so showed genes upregulated at least 4-fold. We now corrected it to 2-fold. While there are now a few more overlapping genes, the overall conclusion, that there is little overlap between different conditions, did not change. We now list the shared genes in the new Supplementary file 5.

      The conclusion that "the transcription of some cold-induced genes reflects the activation of unfolded protein response (UPR)..." is based on analysis of only one gene, lips-11. No other genes were examined for IRE-1 dependence of their induction by cold, neither the other 8 genes that are common between the cold-induced genes here and the ER stress/IRE-1- induced in Shen 2005 (Venn diagram in Figure 7 supplement), nor the hsp-4 reporter. What is the evidence that lips-11 is not the only gene whose induction by cold in this paper's dataset depends on IRE-1? This is a major weakness and needs to be addressed.

      Furthermore, whether induction by cold of lips-11 itself is due to IRE1 activation was not tested, only a partial decrease of reporter fluorescence by ire-1 RNAi is shown. A quantitative measure of the change of lips-11 transcript in ire-1 and xbp-1 mutants is needed to establish if it depends on IRE-1/XBP-1 pathway.

      We now examined by RT-qPCR if the induction of the three genes from Shen at al. (dnj-27, srp-7, and C36B7.6), as well as lips-11 and hsp-4 depends on IRE-1. In the new Figure 4 – figure supplement 2, we show that the upregulation of all these genes is reduced in the cold in the ire1 mutant (although in the wild type, the increase of hsp-4 mRNA appeared to be non-significant, despite the observed upregulation of the hsp-4 GFP reporter).

      The authors could provide more information and the additional data for the transcripts upregulated by both ER stress and cold, including the endogenous lips-11 and hsp-4 transcripts: their identity, fold induction by both cold and ER stress, how their induction is ranked in the corresponding datasets (all of these are from existing data), and do they depend on IRE-1/XBP-1 for induction by cold?

      As above, the dependence of endogenous lips-11 and hsp-4 on IRE-1 is now shown in the new Figure 4 – figure supplement 2, and the shared genes from Figure 7 – figure supplement 1 are listed in the new Supplementary file 5. We did not perform additional analysis comparing various data sets, as we felt that understanding the differences between IRE-1-mediated transcription outputs across different conditions goes well beyond this study.

      Without these additional data and considering that the authors did not directly measure the splicing of xbp-1 transcript (see comment for Fig. 3 below), the conclusion that cold induces UPR by specific activation of IRE-1/XBP-1 pathway is premature.

      To address the splicing of endogenous xbp-1, we examined our ribosome profiling data for the translation of spliced xbp-1, and found that the spliced variant is more abundant in the cold. This data is now shown in Figure 3 – figure supplement 2B.

      There are also technical issues that are making it difficult to interpret some of the results, and missing controls that decrease the rigor of conclusions:

      (1) For RNAseq and ribosome occupancy, were the 20ºC day 1 adult animals collected at the same time as the other set was moved to 4ºC, or were they additionally grown at 20ºC for the same length of time as the 4ºC incubations, which would make them day 2 adults or older at the time of analysis? This information is only given for SUnSET: "animals were cultivated for 1 or 3 additional days at 4ºC or 20ºC".

      In the RNAseq experiments, the 20ºC animals were collected at the same time as the others were moved to 10ºC (and then 4ºC), so they were not additionally grown at 20ºC. We make it now clear in Methods.

      This could be a major concern in interpreting translation data: First, the inducibility of both UPR and HSR in worms is lost at exactly this transition, from day 1 to day 2 or 3 adults, depending on the reporting lab (for example Taylor and Dillin 2013, Labbadia and Morimoto, 2015, De-Souza et al 2022).

      As explained above, the 20ºC animals were collected at the same time as the others were moved to 4ºC. Then, we reported before that ageing appears to be suppressed in animals incubated at 4ºC (Habacher et al., 2016; Figure S1C). Thus, it terms of their biological age, cold-incubated animals appear to be closer to the 20ºC animals at the time they are moved to the cold (day 0). Thus, the ageing-associated deterioration in UPR inducibility mentioned above presumably does not apply to cold-incubated animals, which is in line with the observed IRE-1-dependent upregulation of several genes in day 3 animals at 4ºC.

      How do authors account for this? Would results with reporter induction, or induction of IRE-1 target genes in Fig. 4, change if day 1 adults were used for 20ºC?

      Our analysis in Figure 4 – figure supplement 2 now includes 20ºC animals at day 0, 1, and 3.

      Second, if animals at the time of shift to 4ºC were only beginning their reproduction, they will presumably not develop further during hibernation, while an additional day at 20ºC will bring them to the full reproductive capacity. Did 4ºC and 20ºC animals used for RNAseq and ribosome occupancy have similar numbers of embryos, and were the embryos at similar stages?

      As explained above, the reference animals at 20ºC were young adults containing few embryos. Indeed, at 4ºC the animals do not accumulate embryos. Although we cannot say that for all genes, note that the genes analysed in Figure 4 – figure supplement 2 increase in abundance also when compared with the day 3 animals kept at 20ºC.

      (2) Second, no population density is given for most of the experiments, despite the known strong effects of crowding (high pheromone) on C. elegans growth. From the only two specifics that are given, it seems that very different population sizes were used: for example, 150 L1s were used in survival assay, while 12,000 L1s in SUnSET. Have the authors compared results they got at high population densities with what would happen when animals are grown in uncrowded plates? At least a baseline comparison in the beginning should have been done.

      None of the experiments involved crowded populations. In the SUnSET experiments, we just used larger and more plates to obtain sufficient material.

      (3) Fig. 3: it is unclear why the accepted and well characterized quantitative measure of IRE1 activation, the splicing of xbp-1transcript, is not determined directly by RT-PCR. The fluorescent XBP-1spliced reporter, to my knowledge, has not been tested for its quantitative nature and thus its use here is insufficient. Furthermore, the image of this fluorescent reporter in Fig. 3b shows only one anterior-most row of cells of intestine, and quantitation was done with 2 to 5 nuclei per animal, while lips-11 is induced in entire intestine. Was there spliced XBP-1 in the rest of the intestinal nuclei? Could the authors show/quantify the entire animal (20 intestinal cells) rather than one or two rows of cells?

      As explained above, we now included the analysis of xbp-1 splicing in Figure 3 – figure supplement 2B. As for the fluorescent reporter, it is difficult to measure all gut nuclei since part of the gut is occluded by the gonad. Nonetheless, we do see induction of the reporter in other gut nuclei and show now additional examples from midgut in Figure 3 – figure supplement 2A.  

      (4) The differences in the outcomes from this study and the previous one (Dudkevich 2022) that used 15ºC to 2ºC cooling approach are puzzling, as they would suggest two quite different IRE-1 dependent programs of cold tolerance. It would be good if authors commented on overlapping/non-overlapping genes, and provided their thoughts on the origin of these differences considering the small difference in temperatures.

      Indeed, there seem to be substantial differences between different temperatures and cooling paradigms. While understanding the C. elegans responses to cold is still in its infancy, one possible explanation for the observed differences is that we used different starting growth temperatures. While the initial populations in our study were grown at 20ºC, Dudkevich et al. used 15ºC. Worms display profound physiological differences between these two temperatures. For example, Xiao et al. (2013) showed that the cold-sensitive TRPA-1 channel is important at 15ºC but not 20ºC. Thus, the trajectories along which worms adapt to near freezing temperature may vary depending on their initial physiological state (and perhaps the target temperature, as we used 4ºC and they 2ºC). We now expanded argumentation on this topic in Discussion. I should also say that we planned on testing NLP-3 function in our paradigm, but our request for strains remained unanswered.

      Second, have the authors performed a control where they reproduced the rescue by FA supplementation of poor survival of ire-1 mutants after the 15ºC to 2ºC shift? Without this or another positive control, and without measuring change in lipid composition in their own experiments, it is unclear whether the different outcomes with respect to FAs are due to a real difference in adaptive programs at these temperatures, or to failure in supplementation?

      While we did not re-examine the findings by Dudkevich et al., we did include now another positive control. As reporter by Hou et al. (2014), supplementing unsaturated FAs rescues the induction of the hsp-4 reporter in fat-6 RNAi-ed animals. Although we were able to reproduce that result (Figure 6 – figure supplement 1), the same supplementation procedure did not suppress the lips11 reporter (Figure 6 – figure supplement 2).

      (5) Have the authors tested whether and by how much ire-1(ok799) mutation shortens the lifespan at 20ºC? This needs to be done before the defect in survival of ire-1 mutants in Fig. 7a can be interpreted.

      The lifespan at standard cultivation temperature was examined by others (Henis-Korenblit et al., 2010; Hourihan et al., 2016), showing that ire-1(ok799) mutants live shorter. However, while some mechanism that prolong lifespan may also improve cold survival, the two phenomena are not identical and whether IRE-1 facilitates longevity and cold survival in the same or different way remains to be seen.

      Reviewer #2:

      (1) The conclusions regarding a general transcriptional response are based on one gene, lips-11, which does not affect survival in response to cold. We would suggest altering the title, to replace "Reprograming gene expression: with" Regulation of the lipase lips-11".

      We now examined IRE-1 dependent induction of additional genes – see Figure 4 – figure supplement 2. While we do not know what fraction of cold-induced genes depends on IRE-1, we feel that our findings justify the statement that that gene expression in the cold involves the IRE1/XBP-1 pathway (title) or that that the transcription of some/a subset of cold-induced genes depend on this pathway (in abstract, model, and discussion).

      (2) There is no gene ontology with the gene expression data.

      We now included the top 10 most enriched and suppressed gene categories between 10ºC and 4ºC (since the biggest change happens between these conditions, as shown in Figure 2 – figure supplement 1A). This is now included in the Figure 2 – figure supplement 2.

      (3) Definitive conclusions regarding transcription vs translational effects would require use of blockers such as alpha amanatin or cyclohexamide.

      As explained also for reviewer 1, we confirmed now that at least some genes, whose translation is upregulated based on the ribosome profiling, are indeed upregulated in the cold at the protein level (Figure 2 – figure supplement 3A-B). Thus, the increase in ribosomal occupancy seems to accurately reflect increased translation. Since mRNA levels correlate overall with the ribosomal occupancy, it appears that the mRNA levels are the main determinants of the translation output. Because the lips-11 promoter is sufficient to upregulate the GFP reporter in the cold, it further suggests that the regulation happens at the transcription level. It is true that at this point we cannot completely rule out the effects of mRNA stability, which we clearly acknowledge in the discussion.

      (4) Conclusions regarding the role of lipids are based on supplementation with oleic acid or choline, yet there is no lipid analysis of the cold animals, or after lips-1 knockdown.

      We agree that this is an important direction for future studies but feel that lipidomic analysis goes beyond the scope of current work.

      Although choline is important for PC production, adding choline in normal PC could have many other metabolic impacts and doesn't necessarily implicate PC without lipidomic or genetic evidence.

      We agree and acknowledge it now in Discussion: “However, choline also plays other roles, including in neurotransmitter synthesis and methylation metabolism. Thus, we cannot yet rule out the possibility that the protective effects of choline supplementation stem from functions outside PC synthesis.”

      Reviewer #3:

      The study has several weaknesses: it provides limited novel insights into pathways mediating transcriptional regulation of cold-inducible genes, as IRE-1 and XBP-1are already well-known responders to endoplasmic reticulum stress, including that induced by cold.

      We presume the reviewer refers to the study by Dudkevich et al. (2022). As explained in our manuscript, there are important differences between that study and ours in how the IRE-1 signalling is utilized and to what ends.

      Additionally, the weak cold sensitivity phenotype observed in ire-1 mutants casts doubt on the pathway's key role in cold adaptation. The study also overlooks previous research (e.g.PMID: 27540856) that links IRE-1 to SKN-1, another major stress-responsive pathway, potentially missing important interactions and mechanisms involved in cold adaptation.

      We state in the manuscript that the IRE-1 pathway plays a modest but significant role in cold adaptation and state in the Fig. 7 model and Discussion that additional pathways work alongside IRE-1 to drive cold-specific gene expression.

      Recommendations for the authors:

      Reviewer #1:

      Minor comments:

      (1) Fig. 2B - reporter expression seems to be already present in the intestine of 20ºC animals. What is the turnover rate of GFP in the intestine and how is it affected by the temperature shift? If GFP degradation is inhibited, could it explain the increase in signal in 4ºC animals, rather than increased transcription? This seems to be true for the hsp-4 transcriptional reporter, as the GFP fluorescence appears to increase during 4ºC incubation (Fig. 4a), but the hsp-4 message levels are only increased after 1 day but not in later days at 4ºC, based on the RNAseq in provided dataset. How well do changes in lips-11 reporter fluorescence correspond to the changes in the endogenous lips-11 transcript?

      Note that increased GFP fluorescence is accompanied by increased mRNA levels. In addition to the RNAseq data, we now also examined changes of the endogenous lips-11 transcript by RTqPCR and observed its strong (and IRE-1 dependent) upregulation in the cold– see Figure 4 – figure supplement 2. Moreover, we now included two other examples of GFP-tagged proteins whose fluorescence increases in the cold, concomitant with increased mRNA levels and ribosomal occupancy (Figure 2 – figure supplement 2A-B).

      (2) Descriptions of methods to measure different aspects of translation are very abbreviated and in some places make it difficult to understand the paper. One example - what is RFP in Fig. 2a?

      We replaced now “RFP” with “RPF” (ribosome protected fragment) and the abbreviation is explained firsts time it is used.

      (3) How was the effectiveness of RNAi at 4ºC validated?

      As explained in Methods, we subjected animals to RNAi long before they were transferred to 4ºC, so the corresponding protein is depleted prior to cooling.

      (4) Several of the conclusions on translation and ribosomal occupancy are written in a somewhat confusing way. For example, the authors state that "shift from 10ºC to 4ºC had a strong effect" when describing "impact on translation (ribosomal occupancy)" (page 4), but in the next sentence, they state "a good correlation between mRNA levels and translation (Figure 2A)". Was ribosomal occupancy normalized to the transcript abundance?

      We do not perceive any discrepancy between the two statements. The former refers to the difference between time points, where we observed the largest change in both the transcriptome and ribosomal occupancy from 10ºC to 4ºC (as can be inferred in the PCA plot in Figure 2 - figure supplement 1). The latter refers to the observation that changes in mRNA levels mirrored, in most of cases, similar changes in the ribosomal occupancy.

      The ribosomal occupancy was not normalized, as that would essentially normalize the y-axis (ribosomal occupancy) with the x-axis (mRNA), and so express changes in “translational efficiency” as a function of changes in mRNA abundance. While this type of analysis can also reveal interesting biological phenomena, it would explore a different question.

      (5) "For most transcripts ... increased the abundance of a particular protein appears to correlate depend primarily on the abundance of its mRNA" (page 5). This is an overstatement, the protein levels were not quantified.

      As explained above, we now additionally monitored the expression of two GFP-tagged proteins (CEBP-1 and NUMR-1). Monitoring their expression, we observed the expected increase in GFP fluorescence in the cold (see Figure 2 – figure supplement 3 A-B). While we did not examine them also by western blot, these observations are in line with our conclusions.

      (6) The statement "Since transcription is the main determinant of mRNA levels, these results suggest that cold-specific gene expression primarily depends on transcription activation" seems to assume that message degradation doesn't have much of an impact at 4ºC. What is the evidence here? The authors themselves later suggest either transcription or mRNA stability in Discussion.

      While we cannot exclude that mRNA stability of some genes may be affected, this concern is more valid for the messages that go down in the cold. Although we have done it for only selected genes, each time we observed an increase in the mRNA levels, we also observed the corresponding increase in the protein; this study and Pekec et al. (2022). Then, the lips-11 reporter was designed to monitor the activity of its promoter, which we showed in sufficient to upregulate reporter GFP in the cold. We have now expanded the corresponding paragraph in Discussion, which will hopefully come across as more balanced.  

      Reviewer #2:

      (1) Alter title, conclusions to better reflect specific nature of the work.

      We now provided additional data and feel that it justifies our conclusions and title.

      (2) Use Gene Ontology searches to look at patterns of gene expression in RNA seq data.

      We now show it in Figure 2 – figure supplement 2.

      (3) Use genetic or lipidomic tools rather than solely adding exogenous lipids.

      We agree that lipidomic analysis is an important direction for future research, but feel that lipidomic analysis and further genetic experiments go beyond the scope of current manuscript.

      Reviewer #3:

      To strengthen the evidence for the role of IRE-1 in cold adaptation, the authors might consider performing additional functional assays, such as testing the effects of IRE-1 and XBP-1 mutations under varying cold conditions and testing the genetic interaction of ire-1 with xbp-1, skn-1, and hsf-1 in cold sensitivities. It is also worth using alternative approaches such as independent alleles of ire-1, knockdowns or tissue-specific knockouts (without potential developmental compensation in global constitutive mutants) to better characterize the contribution of IRE-1 to cold adaptation. Additionally, studies that examine tissue-specific responses to cold exposure could provide important insights, as different tissues may utilize distinct molecular pathways to adapt to cold stress.

      We also tested ire-1 and xbp-1 functions by RNAi-mediated depletion. SKN-1 is a good candidate for future studies, but Horikawa at al. (2024) showed that HSF-1 is not required for cold dormancy (at 4ºC); we also show now that HSF-1::GFP does not increase in the cold (Figure 2 – figure supplement 3C).

      This reviewer also recommends clarifying the novelty of your findings in the context of existing literature, particularly regarding the established roles of IRE-1 and XBP-1 in responding to endoplasmic reticulum stress.

      The entry point of this study was to clarify a long-standing problem in hibernation research, i.e., the apparent discrepancy between a global translation repression and de novo gene expression observed in the cold. By connecting cold-mediated expression of some genes to the IRE-1/XBP1 pathway, we strengthen the argumentation for transcription-mediated gene regulation in hibernating animals. We did go the extra mile to test the possible reason behind the activation of UPR<sup>ER</sup> in the cold but feel that a deeper analysis deserves a separate study.

      The term "hibernation" should be avoided or reworded since the study does not provide direct behavioral or physiological evidence for hibernation-like states; instead, the manuscript could refer to "cold-induced responses" or "adaptations to cold temperatures."

      The term “hibernation” was used before even in the context of the C. elegans dauer state, which, arguably, is even less appropriate. In addition to a global suppression of translation shown here, we reported before that the same cooling regime suppresses ageing (Habacher et al., 2016; Figure S1C). Incubating at 4ºC also arrests C. elegans development (Horikawa et al., 2024). Thus, while the worm and mammalian hibernation are certainly not equivalent – which we clearly spell out – we like to use “hibernation” interchangeably with “cold dormancy” to draw attention to a fascinating aspect of C. elegans biology. Still, we use now quotation marks in the title to avoid misunderstanding.

      The discussion could be strengthened by addressing the relevance of prior studies, such as those linking IRE-1 to SKN-1 (PMID: 27540856), TRPA-1 (PMID: 23415228), ZIP-10 (PMID: 29664006), HSF-1 (PMID: 38987256) in cold adaptation and elaborating on how your findings provide new

      The IRE-1/SKN-1 and ZIP-10 papers are now mentioned when describing the model in Figure 7. The TRP-1 and HSF-1 papers are cited when discussing physiological differences between different cold temperatures. Consistent with our studies, the HSF-1 paper shows that nematodes enter a dormant state at 4ºC (but at 9ºC and higher temperatures continue developing). Importantly, HSF-1 promotes the development at 9ºC but is not important for the arrest at 4ºC. We also shown now in Figure 2 – figure supplement 3C that HSF-1 does not go up at 4ºC.

    1. Author response:

      Reviewer #1 (Public Review):

      (1) The authors conclude that the committed progenitors revert to GSCs based on the coexpression of nanos2 and foxl2l nanos2 and based on expression of id1 in mutants but not in WT. Without functional data demonstrating that the progenitors revert to an earlier state, alternative interpretations should be considered. For example, it is possible that the cells initiate the committed progenitor program but continue to express the GSC program and that the coexpression of both programs blocks differentiation.

      Thanks for your insightful comment. We have explored possible alternative interpretations of our data. Regarding the suggested possibility of a continued GSC program in the mutant, we have examined the expression of GSC markers including nanos2 in the mutant at different stages. We found that in the mutant, nanos2 or other GSC markers were not significantly upregulated in GSC-to progenitor transition (G-P) and early progenitors (Prog-E) (Fig. 4B). The expression of these GSC markers was also low in the integrated clusters I4-I6 when G-P and Prog-E stages were prominent (Fig. 3D and Fig. 3E). GSC marker nanos2 was high only in mutant Prog-C. These results argue against continued GSC programs in the foxl2l mutants. Another possible explanation is that perhaps some mutant Prog-C acquires some GSC property with the upregulation of nanos2 instead of a continuous GSC program. We have now clarified our rationale about mutant cells gaining new GSC properties and included both interpretations in the Result.

      Consistent with this possibility, some Fox family members, FoxL2 and FoxPs for example, are known to be both activators and repressors of transcription or act primarily as repressors. Potentially relevant to this work, repressive activity of FoxL2 has been previously reported in the mammalian ovary (Pisarska et al Endocrinology 2004, Pisarska Am J. Phys Endo. Metabolism 2010, Kuo Reproduction 2012, Kuo Endocrinology 2011, as well as more recent publications). In that context interfering with FoxL2 was proposed to cause upregulated expression of genes normally repressed by FoxL2, accelerated follicle recruitment, and premature ovarian failure.

      FoxL2 exerts both activating and repressive activities. We believe that Foxl2l can also activate and repress its target gene expression. Although its target genes have not been clearly identified, Foxl2l may activate genes involved such process as oogenic meiosis, and may also repress other genes involved in other processes, say perhaps nanos2.

      (2) The authors conclude that the committed progenitor stage is "the gate toward female determination" and that the cells "stay at S-Phase temporarily before differentiation". This conclusion seems to be based solely on single cell RNAseq expression. In several species, including zebrafish, meiotic entry occurs earlier in females and has been correlated with ovary development. The possibility that the late progenitor stage, the stage when meiotic genes are detected in this study and a stage missing in foxl2l mutants, is actually the key stage for female determination cannot be excluded by the data provided.

      We agree that Prog-L is important for the initiation of female meiosis. We have made revision in the text to point out the importance of Prog-L in female differentiation.

      (3) The authors discuss prior working showing that loss of germ cells leads to male development and that germ cells are required for female development and claim to extend that work by showing here that some progenitors are already sexually differentiated. First, the stages compared are completely different. The earlier work looks at the primordial germ cells and their loss in the first few days of development before a gonad forms. In contrast, this work examines stages well after the gonad has formed and during sex determination.

      Both previous studies and our study indicate the important role of germ cells in zebrafish sex differentiation during gonadal development. The earlier works show that the abundance of primordial germ cells contributes to sex differentiation. Our current finding further suggests the existence of female identify in some germ cells at the juvenile stage and discusses the importance of cell in sexual differentiation. We have added the developmental age in our study to emphasize the age difference.

      The second concern is that the conclusion that the progenitors are differentiated is based solely on the expression of foxl2l, which is initially expressed in the juvenile ovary state that lab strains have been shown to develop through (Wilson et al Front Cell Dev Bio 2024). While it is fair to state that some cells express ovary markers at this stage, it is unclear that this is sufficient evidence that the cells are differentiated.

      The conclusion about the differentiation of progenitors is not based solely on foxl2l expression; rather, it is according to the whole transcriptomic profiles of both WT (Figure 1B) and foxl2l mutant cells (Figure 3A) as well as the foxl2l mutant phenotype (Figure 2C). Three types of progenitors, Prog-E, Prog-C and Prog-L were identified by whole transcriptomic analysis in WT. In foxl2l mutants, the transcriptomic profile further shows that Prog-L and meiotic cells are completely lost, and all germ cells undergo male differentiation eventually. These results together indicate that the differentiation of Prog-C to Prog-L guides the progenitor toward female differentiation. Our result also showed that in the juvenile gonad, foxl2l expression is high in two types of progenitors, Prog-C and Prog-L, and become low after meiotic entry.

      For example, in the context of the foxl2l mutant, the authors observe that GSCs and early progenitors inappropriately express foxl2l, but the mutants develop as males. Thus, expression of foxl2l transcripts alone is insufficient evidence to claim that the cells are already differentiated as female.

      The foxl2l mutants develop into males because they lack functional Foxl2l. Although the mutated foxl2l transcript is present in mutant cells, these transcripts are not functional. These mutants develop into males eventually. This result is consistent with our claim that functional Foxl2l is important for the development of Prog-L and female differentiation.

      (4) The comparison between medaka and zebrafish foxl2l mutants seems to suggest that Foxl2l is required for meiosis in medaka but has a different role in zebrafish. However, if foxl2l represses the earlier developmental programs of GSCs and early progenitors, it is possible that continued expression of these early programs interferes with activation of meiotic genes. This could account for the absence of the late progenitor stage in foxl2l mutants since the late progenitor stage is defined by and distinguished from the earlier stages by expression of foxl2l and meiotic genes. If so, foxl2l may be similarly required in both systems.

      Medaka and zebrafish Foxl2l may share similar functions such as the stimulation of meiotic gene expression and promotion of oogenesis in the female germ cells preparing for meiotic entry. In addition, we also detected aberrant upregulation of nanos2 in some foxl2l mutant cells. The idea of “continued expression of these early programs interferes with activation of meiotic genes” is conceivable, but for now we have no evidence for it. We do not know whether the absence of meiotic genes is due to an interference caused by the activation of nanos2 or due to the complete loss of Prog-L and meiotic cells. It will also be interesting to find out whether medaka Foxl2l has a role in early progenitors

      (5) The authors state that "Foxl2l may ensure female differentiation by preventing stemness and antagonizing male development." It is unclear why suppressing stemness would be necessary for female differentiation since female zebrafish have stem cells as do male zebrafish. It seems likely that turning off the GSC and early differentiation programs is important for allowing expression of meiosis and oocyte differentiation genes, and that a gene other than Foxl2l is required for differentiation from GSCs to spermatocytes.

      It is true that we have not proved whether suppression of stemness is required for female differentiation. Maybe our earlier statement is a bit misleading. We agree that it is likely that turning off the GSC and early differentiation programs is important for allowing expression of meiotic and oocyte differentiation genes, and that a gene other than Foxl2l is required for differentiation from GSCs to spermatocytes. To avoid confusion, we have modified our statement in the text.

      (6) Based on its expression in mutant progenitors, p53 is proposed to assist with alternative differentiation of mutant germ cells. Although p53 transcripts are expressed, no evidence is provided that p53 is involved in differentiation of germ cells, and sex bias has not been associated with the published p53 mutants in zebrafish. Furthermore, while p53 has been shown to be important for ovary to testis transformation in mutant contexts in adults, it appears dispensable for testis development in mutants that disrupt ovary differentiation in earlier stages (Rodriguez-Mari et al PLoS Gen 2010, Shive PNAS 2010, Hartung et al Mol. Reprod. Dev 2014, Miao Development 2017, Kaufman et al PLoSGen 2018, Bertho et al Development 2021. It is possible that p53 eliminates foxl2l mutant germ cells that are simultaneously expressing multiple developmental programs, but this possibility would need to be tested.

      The tp53<sup>-/-</sup>foxl2l<sup>-/-</sup> double mutant cannot alleviate the all-male phenotype of foxl2l<sup>-/-</sup> mutant (Dev Biol, 517, 91-99, 2024), indicating that the male development is not due to p53-mediated germ cell apoptosis. We have cited the suggested papers and compared relation of tp53 between these mutants (fancl, zar1, etc.) mentioned in the cited papers. Since tp53 was enriched in certain foxl2l<sup>-/-</sup> mutant cell clusters, and tp53 mutation fails to rescue the all-male phenotype, it is possible that p53 expressed in these mutant cell clusters has roles other than inducing apoptosis. One assumption is that p53 may be involved in the germ cell differentiation, especially p53 is known to promote differentiation of airway epithelial progenitors, adipogenesis and embryonic stem cells. We have emphasized that the suggested role of p53 in germ cell differentiation is our assumption in the Discussion.

      Reviewer #3 (Public Review):

      This is the first report to show a transcriptional factor, foxl2l, is essential for the development of female germs. Without foxl2l, germ cells will be developed into sperms. The report also clearly defined the arrested stage of early germ cells in foxl2l mutants, or stages that is critical for foxl2l to play a role for the further development of female germ cells.

      (1) Due to lack of cell lineage tracing, the claim of foxl2l suppression of dedifferentiate of progenitor cells to GSC based on the gene expression and cell number changes is weak.

      Thanks for your comments pointing out our contribution and also weakness. We acknowledge the lack of direct evidence on the reversion of mutant Prog-C to GSC in our data. We now removed the claim about the repression of stemness by Foxl2l.

      (2) In addition, separation of early germ cell types in foxl2l mutant using marker genes from WT may not be optimal.

      The cell type of mutant cell is determined by two independent analyses. First is inferring the developmental stage of mutant cells. This approach assumes that mutant cells can indeed be mapped to specific WT stages through their transcriptomic profiles. However, as indicated by this reviewer’s comments, mutant cells exhibited heterogeneity and can be distinct from WT cells. Defining cell types in mutants by WT markers may not be optimal. To address this, we conducted another analysis, co-clustering. Mutant cells and WT cells at early stages (GSC , G-P, Prog-E, Prog-C(S) and Prog-C) were co-clustered. This approach does not assume a direct correspondence between mutant and WT developmental stages. Instead, it facilitates the identification of novel germ cell types in mutants while characterizing the relationship between WT and mutant cells. In some clusters, both WT and mutant cells were present, indicating high transcriptomic similarity. In other clusters, most cells are only mutant cells, indicating distinct mutant cell types (Figure 3C). We can, therefore, assign developmental properties to these mutant cells with confidence.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The aim of this study is to test the overarching hypothesis that plasticity in BNST CRF neurons drives distinct behavioral responses to unpredictable threat in males and females. The manuscript provides evidence for a possible sex-specific role for CRF-expressing neurons in the BNST in unpredictable aversive conditioning and subsequent hypervigilance across sexes. As the authors note, this is an important question given the high prevalence of sex differences in stress-related disorders, like PTSD, and the role of hypervigilance and avoidance behaviors in these conditions. The study includes in vivo manipulation, bulk calcium imaging, and cellular resolution calcium imaging, which yield important insights into cell-type specific activity patterns. However, it is difficult to generate an overall conclusion from this manuscript, given that many of the results are inconsistent across sexes and across tests and there is an overall lack of converging evidence. For example, partial conditioning yields increased startle in males but not females, yet, CRF KO only increases startle response in males after full conditioning, not partial, and CRF neurons show similar activity patterns between partial and full conditioning across sexes. Further, while the study includes a KO of CRF, it does not directly address the stated aim of assessing whether plasticity in CRF neurons drives the subsequent behavioral effects unpredictable threat.

      We appreciate the reviewer’s summary and agree that there is a large amount of complexity to the results, and that it was difficult to generate a simple model/conclusion to summarize our work. This is the unfortunate side effect of looking across both sexes at different conditioning paradigms, however, we believe that it is important to convey this information to the field even without a simple answer.  Our data reinforces the very important findings from the Maren and Holmes groups that partial fear is a different process than full fear, and that the BNST plays a differential role here. We have reworded the manuscript to better convey this complexity.

      A major strength of this manuscript is the inclusion of both males and females and attention to possible behavioral and neurobiological differences between them throughout. However, to properly assess sex-differences, sex should be included as a factor in ANOVA (e.g. for freezing, startle, and feeding data in Figure 1) to assess whether there is a significant main effect or interaction with sex. If sex is not a statistically significant factor, both sexes should be combined for subsequent analyses. See, Garcia-Sifuentes and Maney, eLife 2021 https://elifesciences.org/articles/70817. There are additional cases where t-tests are used to compare groups when repeated measures ANOVAs would be more appropriate and rigorous.

      We agree with the reviewer that this is the more appropriate analysis and have changed the analysis and figures throughout the revised manuscript to better assess sex differences as well as differences between fear conditions.

      Additionally, it's unclear whether the two sexes are equally responsive to the shock during conditioning and if this is underlying some of the differences in behavioral and neuronal effects observed. There are some reports that suggest shock sensitivity differs across sexes in rodents, and thus, using a standard shock intensity for both males and females may be confounding effects in this study.

      This is a great point. We have conducted appropriate analysis (Sex by Tone Repeated measures two-way ANOVAS for each of the groups: Ctrl, Full, Part) and there are no sex differences in freezing between males and females. The extent of conditioning is not different between the groups suggesting that if there was a difference in shock sensitivity, it is not driving any discernible differences in behavioral performance. However, it is possible that the experience of the shock differs for the animals even in the absence of any measurable behavior.

      The data does not rule out that BNST CRF activity is not purely tracking the mobility state of the animal, given that the differences in activity also track with differences in freezing behavior. The data shows an inverse relationship between activity and freezing. This may explain a paradox in the data which is why males show a greater suppression of BNST activity after partial conditioning than full conditioning, if that activity is suspected to drive the increased anxiety-like response. Perhaps it reflects that activity is significantly suppressed at the end of the conditioning session because animals are likely to be continuously freezing after repeated shock presentations in that context. It would also explain why there is less of a suppression in activity over the course of the recall session, because there is less freezing as well during recall compared with conditioning.

      While it is possible that the BNST may be tracking activity, we believe it is not purely tracking mobility state. For instance, while freezing increases across tone exposures in Part fear regardless of sex, males show an increase while females show a reduction in BNST response during tone 5 (Fig 2K). The data the reviewer refers to showing the inverse relationship with BNST activity and freezing would have suggested the opposite response if it were purely tracking the mobility state of the animal. This is also the case with BNST<sup>CRF</sup> activity to first and last tone during recall. Despite the suppression of activity over the course of recall (Fig 5K), we see an increase in BNST<sup>CRF</sup> tone response when comparing tone 1 and 6 in males and a decrease in females (Fig 6M), again suggesting the BNST is responding to more than just activity.

      A mechanistic hypothesis linking BNST CRF neurons, the behavioral effects observed after fear conditioning, and manipulation of CRF itself are not clearly addressed here.

      We disagree with this assertion. The data suggests a model in which males respond with increased arousal and Part fear males show persistent activation of the BNST and BNST<sup>CRF</sup> neurons during fear conditioning and recall while female Part fear mice show the opposite response. This female response differs from what the field believes to be the role of the BNST in sustained fear. Additionally, we show that CRF knockdown is not involved in fear differentiation or fear expression in males, while it enhances fear learning and recall in females. We have reworded the manuscript to highlight these novel findings.

      Reviewer #2 (Public Review):

      This study examined the role of CRF neurons in the BNST in both phasic and sustained fear in males and females. The authors first established a differential fear paradigm whereby shocks were consistently paired with tones (Full) or only paired with tones 50% of the time (Part), or controls who were exposed to only tones with no shocks. Recall tests established that both Full and Part conditioned male and female mice froze to the tones, with no difference between the paradigms. Additional studies using the NSF and startle test, established that neither fear paradigm produced behavioral changes in the NSF test, suggesting that these fear paradigms do not result in an increase in anxiety-like behavior. Part fear conditioning, but not Full, did enhance startle responses in males but not females, suggesting that this fear paradigm did produce sustained increases in hypervigilance in males exclusively.

      Thank you for this clear summary of the behavioral work.

      Photometry studies found that while undifferentiated BNST neurons all responded to shock itself, only Full conditioning in males lead to a progressive enhancement of the magnitude of this response. BNST neurons in males, but not females, were also responsive to tone onset in both fear paradigms, but only in Full fear did the magnitude of this response increase across training. Knockdown of CRF from the BNST had no effect on fear learning in males or females, nor any effect in males on fear recall in either paradigm, but in females enhanced both baseline and tone-induced freezing only in Part fear group. When looking at anxiety following fear training, it was found in males that CRF knockdown modulated anxiety in Part fear trained animals and amplified startle in Fully trained males but had no effect in either test in females. Using 1P imaging, it was found that CRF neurons in the BNST generally decline in activity across both conditioning and recall trials, with some subtle sex differences emerging in the Part fear trained animals in that in females BNST CRF neurons were inhibited after both shock and omission trials but in males this only occurred after shock and not omission trials. In recall trials, CRF BNST neuron activity remained higher in Part conditioned mice relative to Full conditioned mice.

      Overall, this is a very detailed and complex study that incorporates both differing fear training paradigms and males and females, as well as a suite of both state of the art imaging techniques and gene knockdown approaches to isolate the role and contributions of CRF neurons in the BNST to these behavioral phenomena. The strengths of this study come from the thorough approach that the authors have taken, which in turn helped to elucidate nuanced and sex specific roles of these neurons in the BNST to differing aspects of phasic and sustained fear. More so, the methods employed provide a strong degree of cellular resolution for CRF neurons in the BNST. In general, the conclusions appropriately follow the data, although the authors do tend to minimize some of the inconsistencies across studies (discussed in more depth below), which could be better addressed through discussion of these in greater depth. As such, the primary weakness of this manuscript comes largely from the discussion and interpretation of mixed findings without a level of detail and nuance that reflects the complexity, and somewhat inconsistency, across the studies. These points are detailed below:

      - Given the focus on CRF neurons in the BNST, it is unclear why the photometry studies were performed in undifferentiated BNST neurons as opposed to CRF neurons specifically (although this is addressed, to some degree, subsequently with the 1P studies in CRF neurons directly). This does limit the continuity of the data from the photometry studies to the subsequent knockdown and 1P imaging studies. The authors should address the rationale for this approach so it is clear why they have moved from broader to more refined approaches.

      The reviewer raises a good point.  We did some preliminary photometry studies with BNST CRF neurons and found that there was poor time locked signal. We reasoned that this was due to the heterogeneity of the cell activity, as we saw in our previous publication (Yu et al). Because of this, we moved to the 1p imaging work in place of continued BNST CRF photometry. We have also reworded the manuscript to better discuss the complexities and inconsistencies in findings across the studies.

      - The CRF KD studies are interesting, but it remains speculative as to whether these effects are mediated locally in the BNST or due to CRF signaling at downstream targets. As the literature on local pharmacological manipulation of CRF signaling within the BNST seems to be largely performed in males, the addition of pharmacological studies here would benefit this to help to resolve if these changes are indeed mediated by local impairments in CRF release within the BNST or not. While it is not essential to add these experiments, the manuscript would benefit from a more clear description of what pharmacological studies could be performed to resolve this issue.

      We agree with the reviewer that the addition of this experiment would be highly informative for differentiating the role of CRF in the BNST. This is something that will need to be considered moving forward and we have added this as a point of discussion.

      - While I can appreciate the authors perspective, I think it is more appropriate to state that startle correlates with anxiety as opposed to outright stating that startle IS anxiety. Anxiety by definition is a behavioral cluster involving many outputs, of which avoidance behavior is key. Startle, like autonomic activation, correlates with anxiety but is not the same thing as a behavioral state of anxiety (particularly when the startle response dissociates from behavior in the NSF test, which more directly tests avoidance and apprehension). Throughout the manuscript the use of anxiety or vigilance to describe startle becomes interchangeable, but then the authors also dissociate these two, such as in the first paragraph of the discussion when stating that the Part fear paradigm produces hypervigilance in males without influencing fear or anxiety-like behaviors. The manuscript would benefit from harmonization of the language used to operationally define these behaviors and my recommendation would be to remain consistent with the description that startle represents hypervigilance and not anxiety, per se.

      The reviewer raises an excellent point, we have clarified in the revised manuscript.

      - The interpretation of the anxiety data following CRF KD is somewhat confusing. First, while the authors found no effect of fear training on behavior in the NSF test in the initial studies, now they do, however somewhat contradictory to what one would expect they found that Full fear trained males had reduced latency to feed (indicative of an anxiolytic response), which was unaltered by CRF KD, but in Part fear (which appeared to have no effect on its own in the NSF test), KD of CRF in these animals produced an anxiolytic effect. Given that the Part fear group was no different from control here it is difficult to interpret these data as now CRF KD does reduce latency to feed in this group, suggesting that removal of CRF now somehow conveys an anxiolytic response for Part fear animals. In the discussion the authors refer to this outcome as CRF KD "normalizing" the behavior in the NSF test of Part fear conditioned animals as now it parallels what is seen after Full fear, but given that the Part fear animals with GFP were no different then controls (and neither of these fear training paradigms produced any effect in the NSF test in the first arm of studies), it seems inappropriate to refer to this as "normalization" as it is unclear how this is now normalized. Given the complexity of these behavioral data, some greater depth in the discussion is required to put these data in context and describe the nuance of these outcomes, in particular a discussion of possible experimental factors between the initial behavioral studies and those in the CRF KD arm that could explain the discrepancy in the NSF test would be good (such as the inclusion of surgery, or other factors that may have differed between these experiments). These behavioral outcomes are even more complex given that the opposite effect was found in startle whereby CRF KD amplified startle in Full trained animals. As such, this portion of the discussion requires some reworking to more adequately address the complexity of these behavioral findings.

      The reviewer raises a good point, and we agree that there are many inconsistencies in the behaviors. We believe it is still good to show these results but have expanded the manuscript on potential reasons for these behavioral inconsistencies.

      Reviewer #3 (Public Review):

      Hon et al. investigated the role of BNST CRF signaling in modulating phasic and sustained fear in male and female mice. They found that partial and full fear conditioning had similar effects in both sexes during conditioning and during recall. However, males in the partially reinforced fear conditioning group showed enhanced acoustic startle, compared to the fully reinforced fear conditioning group, an effect not seen in females. Using fiber photometry to record calcium activity in all BNST neurons, the authors show that the BNST was responsive to foot shock in both sexes and both conditioning groups. Shock response increased over the session in males in the fully conditioned fear group, an effect not observed in the partially conditioned fear group. This effect was not observed in females. Additionally, tone onset resulted in increased BNST activity in both male groups, with the tone response increasing over time in the fully conditioned fear group. This effect was less pronounced in females, with partially conditioned females exhibiting a larger BNST response. During recall in males, BNST activity was suppressed below baseline during tone presentations and was significantly greater in the partially conditioned fear group. Both female groups showed an enhanced BNST response to the tone that slowly decayed over time. Next, they knocked CRF in the BNST to examine its effect on fear conditioning, recall and anxiety-like behavior after fear. They found no effect of the knockdown in either sex or group during fear conditioning. During fear recall, BNST CRF knockdown lead to an increase in freezing in only the partially conditioned females. In the anxiety-like behavior tasks, BNST CRF knockdown lead to increased anxiolysis in the partially reinforced fear male, but not in females. Surprisingly, BNST CRF knockdown increased startle response in fully conditioned, but not partially conditioned males. An effect not observed in either female group. In a final set of experiments, the authors single photon calcium imaging to record BNST CRF cell activity during fear conditioning and recall. Approximately, 1/3 of BNST CRF cells were excited by shock in both sexes, with the rest inhibited and no differences were observed between sexes or group during fear conditioning. During recall, BNST CRF activity decreased in both sexes, an effect pronounced in male and female fully conditioned fear groups.

      Overall, these data provide novel, intriguing evidence in how BNST CRF neurons may encode phasic and sustained fear differentially in males and females. The experiments were rigorous.

      We thank you for this positive review of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are several graphs representing different analyses of (presumably) the same group of subjects, but which have different N/group. For example, in Figure 2:

      (1) Fig 2P seems to have n=10 in Part Male group (Peak), but 2Q only has n=9 in Part Male group (AUC)

      (2) Fig 2S seems to have n=10 in Part Female group (Peak), but 2T only has n=7 in Part Female group (AUC)

      (3) Fig 2G (Tone Resp) has n=6 Full Males but 2F (Tone Resp), 2H (Shock Resp), and 2I (Shock Resp) have n=7 Full Males

      (4) Fig 2K (Tone Resp) has n=7 Full Females but 2L (Tone Resp), 2M (Shock Resp), and 2N (Shock Resp) have n=8 Full Females

      (5) Fig 2L (Tone Resp) has n=9 Part Females but 2K (Tone Resp), 2M (Shock Resp), and 2N (Shock Resp) have n=10 Part Females

      It's possible that this is just due to overlapping individual data points which are made harder to see due to the low resolution of the figures. If so, this can be easily rectified. However, there may also be subjects missing from some analyses which must be clarified or corrected.

      We thank you for catching these. We have gone through and fixed any issues with data points and have added statistics and exclusions in datasets to figure legends to further explain inconsistencies.

      Regarding statistical tests:

      (2) Data in Figs 2G and 2I should be analyzed using a two-way RM ANOVA.

      We have now included sex as a factor in most of our analysis and are now using appropriate statistical tests.

      (3) Data in Fig 3K should be analyzed using a two-way RM ANOVA.

      We are now using appropriate statistical tests.

      Calcium activity in response to the shock during conditioning and in response to the tone during recall should be included in Figure 5. Given partial and full animals also receive unequal presentations of the cue, it would be useful to see the effects trial by trial or normalized to the first 3 presentations only.

      The reviewer raises a great point. We have changed this figure and have now added the response to shock and tones. Since we are most interested in the difference between sustained and phasic fear, we decided to compare tone 3 in Full fear and tone 4 in Part fear, which differ in the ambiguity of their cue and only have one tone difference.

      Histology maps should be included for all experiments depicting viral spread and implant location for all animals, in addition to the included representative histology images. These can be placed in the supplement.

      We agree this is helpful. While we have confirmed all of the experiments are hits, the tissue is no longer in condition for this analysis.

      Referring to the quantification of peaks in fiber photometry and cellular resolution calcium imaging data as "spikes" is a bit misleading given the inexact relationship between GCAMP sensor dynamics/calcium binding and neuronal action potentials, perhaps calling it "event" frequency would be more clear.

      We have changed the references of spikes to events as suggested.

      The legend for Figure 2S is mislabeled as A.

      Thank you for catching this mistake, it has been fixed.

      The methods refer to CRFR1 fl/fl animals but it seems no experiments used these animals, only CRF fl/fl.

      We have fixed this, thank you.

      Reviewer #2 (Recommendations For The Authors):

      As stated in the public review, while I think the addition of local pharmacological studies blocking CRF1 and 2 receptors in the BNST in both males and females, done under the same conditions as all of the other testing herein, would help to resolve some of the speculation of interpreting the CRF KD data, I dont think these studies are essential to do, but it would be good for the authors to more explicitly state what studies could be done and how they could facilitate interpretation of these data.

      Thank you for this suggestion. We have added this discussion into the manuscript.

      Asides from this, my other recommendations for the authors are to more clearly address the discrepancies in behavioral outcomes across studies and explicitly describe their rationale for the sequence of experiments performed and to harmonize their operationalization of how they define anxiety.

      Again, we appreciate these great suggestions. We have added more discussion on the behavioral discrepancies as well as rationale for the experiments. We have also changed the wording to remain consistent that the NSF test relates to anxiety and the Startle test relates to vigilance.

      - In Figure 2, Panel S is listed as Panel A in the caption and should be corrected.

      Thank you for catching this mistake, we have fixed it.

      Reviewer #3 (Recommendations For The Authors):

      My biggest concerns I have regard the interpretations and some conclusions from this data set, which I have stated below.

      (1) It was surprising to see minimal and somewhat conflicting behavioral effects due to BNST CRF knockdown. The authors provide a representative image and address this in the conclusion. They mention the role of local vs projection CRF circuits as well as the role of GABA. I don't think those experiments are necessary for this manuscript. However, it may be worthwhile to see through in situ hybridization or IHC, to see BNST CRF levels after both full and partial conditioned fear paradigms. Additionally, it would help to see a quantification of the knockdown of the animals.

      Thank you for these great suggestions. We will consider these for future experiments. We piloted out some CRF sensor experiments to probe this, but it was unclear if the signal to noise for the sensor was sufficient. We hope to do more of this in the future if we ever manage to get funding for this work.

      The authors can add a figure showing deltaF/F changes from control.

      We did not have control mice in these in-vivo experiments Our main interests lie in understanding the differences in Full and Part Fear conditioning paradigms specifically.

      (2) Related to the previous point, it was surprising to see an effect of the CRF deletion in the full fear group compared to the partial fear in the acoustic startle task. To strengthen the conclusion about differential recruitment of CRF during phasic and sustained fear, the experiment in my previous point could help elucidate that. Conversely, intra-BNST administration of a CRF antagonist into the BNST before the acoustic startle after both conditioning tasks could also help. Or patch from BNST CRF neurons after the conditioning tasks to measure intrinsic excitability. Not all these experiments are needed to support the conclusion, it's some examples.

      We thank the reviewer for these suggestions and agree that these are important experiments. We will consider this in future experiments exploring the role of BNST CRF in fear conditioning.

      (3) In Figure 5 F and K, the authors report data combined for both part and full fear conditioning. Were there any differences between the number of excited or inhibited neurons b/t the conditioning groups?

      We are only looking at the first shock exposure in these figures. These were combined because the first tone and shock exposure is identical in Full and Part fear conditioning. Differences in these behavioral paradigms emerge after Tone 3 exposure, where Part fear does not receive a shock while Full fear does.

      Also, can the authors separate male and female traces in Fig 5 E and P?

      Traces in Fig E are from females only. We did not include male traces because males and females had identical responses to first shock, and we felt only one trace was needed as an example. Traces in Figure P are from males. We did not show female traces because females did not show differential effects from baseline to end.

      (4) Also, regarding the calcium imaging data, what was the average length of a transient induced by shock? Were there any differences between the sexes?

      We have many cells in each condition, and the length of traces after shock were all different and hard to quantify, as for example, sometimes cells were active before shock and thus trace length would be difficult to quantify. Therefore, to keep consistency and reduce ambiguity regarding trace lengths, we focused on keeping the time consistent across mice and focused on the 10 second window post shock to be consistent across conditions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, Osiurak and colleagues investigate the neurocognitive basis of technical reasoning. They use multiple tasks from two neuroimaging studies and overlap analysis to show that the area PF is central for reasoning, and plays an essential role in tool-use and non-tool-use physical problem-solving, as well as both conditions of mentalizing task. They also demonstrate the specificity of the technical reasoning and find that the area PF is not involved in the fluid-cognition task or the mentalizing network (INT+PHYS vs. PHYS-only). This work suggests an understanding of the neurocognitive basis of technical reasoning that supports advanced technologies.

      Strengths:

      -The topic this study focuses on is intriguing and can help us understand the neurocognitive processes involved in technical reasoning and advanced technologies.

      -The researchers obtained fMRI data from multiple tasks. The data is rich and encompasses the mechanical problem-solving task, psychotechnical task, fluid-cognition task, and mentalizing task.

      -The article is well written.

      We sincerely thank Reviewer 1 for their positive and very helpful comments, which helped us improve the MS. Thank you.

      Weaknesses:

      - Limitations of the overlap analysis method: there are multiple reasons why two tasks might activate the same brain regions. For instance, the two tasks might share cognitive mechanisms, the activated regions of the two tasks might be adjacent but not overlapping at finer resolutions, or the tasks might recruit the same regions for different cognition functions.

      Thus, although overlap analysis can provide valuable information, it also has limitations.

      Further analyses that capture the common cognitive components of activation across different

      tasks are warranted, such as correlating the activation across different tasks within subjects for a region of interest (i.e. the PF).

      We thank Reviewer 1 for this comment. We added new analyses to address the two alternative interpretations stressed here by Reviewer 1, namely, the same-region-but-differentfonction interpretation and the adjacency interpretation. The new analyses ruled out both alternative interpretations, thereby reinforcing our interpretation.

      “The conjunction analysis reported was subject to at least two key limitations that needed to be overcome to assure a correct interpretation of our findings. The first was that the tasks could recruit the same regions for different cognition functions (same-region-but-different-function interpretation). The second was that the activated regions of the different tasks could be adjacent but did not overlap at finer resolutions (adjacency interpretation). We tested the same-region-but-different-function interpretation by conducting additional ROI analyses, which consisted of correlating the specific activation of the left area PF (i.e., difference in terms of mean Blood-Oxygen Level Dependent [BOLD] parameter estimates between the experimental condition minus the control condition) in the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. This analysis did not include the mechanical problem-solving task because the sample of participants was not the same for this task. As shown in Fig. 5, we found significant correlations between all the tasks that were hypothesized as recruiting technical reasoning, i.e., the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .05). By contrast, no significant correlation was obtained between these three tasks and the fluid-cognition task (all p > .15). This finding invalidates the same-region-but-different-function interpretation by revealing a coherent pattern in the activation of the left area PF in situations in which participants were supposed to reason technically. We examined the adjacency interpretation by analysing the specific locations of individual peak activations within the left area PF ROI for the mechanical problemsolving task, the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. These peaks, which corresponded to the maximum value of activation obtained for each participant within the left area PF ROI, are reported in Fig. 6. As can be seen, the peaks of the fluid-cognition task were located more anteriorly, in the left area PFt (Parietal Ft) and the postcentral cortex, compared to the peaks of the other four tasks, which were more posterior, in the left area PF. Statistical analyses based on the y coordinates of the individual activation peaks confirmed this description (Fig. 6). Indeed, the y coordinates of the peaks of the mechanical problem-solving task, the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task were posterior to the y coordinates of the peaks of the fluid-cognition task (all p < .05), whereas no significant differences were reported between the four tasks (all p > .05). These findings speak against the adjacency interpretation by revealing that participants recruited the same part of the left area PF to perform tasks involving technical reasoning.” (p. 11-13)

      Control tasks may be inadequate: the tasks may involve other factors, such as motor/ actionrelated information. For the psychotechnical task, fluid-cognition task, and mentalizing task, the experiment tasks need not only care about technical-cognition information but also motor-related information, whereas the control tasks do not need to consider motor-related information (mainly visual shape information). Additionally, there may be no difference in motor-related information between the conditions of the fluid-cognition task. Therefore, the regions of interest may be sensitive to motor-related information, affecting the research conclusion.

      We thank Reviewer 1 for this comment. We added a specific section in the discussion that addresses this limitation.

      “The second limitation concerns the alternative interpretation that the left area PF is not central to technical reasoning but to the storage of sensorimotor programs about the prototypical manipulation of common tools. Here we show that the left area PF is recruited even in situations in which participants do not have to process common manipulable tools. For instance, some items of the psychotechnical task consisted of pictures of tractor, boat, pulley, or cannon. The fact that we found a common activation of the left area PF in such tasks as well as in the mechanical problem-solving task, in which participants could nevertheless simulate the motor actions of manipulating novel tools, indicates that this brain area is not central to tool manipulation but to physical understanding. That being said, some may suggest that viewing a boat or a cannon is enough to incite the simulation of motor actions, so our tasks were not equipped to distinguish between the manipulation-based approach and the reasoning-based approach. We have already shown that the left area PF is more involved in tasks that focus on the mechanical dimension of the tool-use action (e.g., the mechanical interaction between a tool and an object) than its motor dimension (i.e., the interaction between the tool and the effector [e.g., 24, 40]). Nevertheless, we recognize that future research is still needed to test the predictions derived from these two approaches.” (p. 18-19)

      -Negative results require further validation: the cognitive results for the fluid-cognition task in the study may need more refinement. For instance, when performing ROI analysis, are there any differences between the conditions? Bayesian statistics might also be helpful to account for the negative results.

      We agree that our negative results required further validation. We conducted the ROI analyses suggested by Reviewer 1, which confirmed the initial whole-brain analyses.

      “Region of interest (ROI) results. We conducted additional analyses to test the robustness of our findings. One of our results was that we did not report any specific activation of the left area PF in the fluid-cognition task contrary to the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. However, this negative result needed exploration at the ROI level. Therefore, we created a spherical ROI of the left area PF with a radius of 12 mm in the MNI standard space (–59; –31; 40). This ROI was literature-defined to ensure the independence of its selection (40). ROI results are shown in Fig. 4. The analyses confirmed the results obtained with the whole-brain analyses by indicating a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .35).” (p. 10-11)

      Reviewer #1 (Recommendations For The Authors):

      (1) I may not fully grasp some of the arguments. In the abstract, what does the term "intermediate-level" mean, and why is it an intermediate-level state? In the sentence "the existence of a specific cognitive module in the human brain dedicated to materiality", I cannot see a clear link between technical cognition and the word "materiality".

      We used the term materiality to refer to a potential human trait that allows us to shape the physical world according to our ends, by using, making tools and transmiting them to others. This is a reference to Allen et al. (2020; PNAS): “We hope this empirical domain and modeling framework can provide the foundations for future research on this quintessentially human trait: using, making, and reasoning about tools and more generally shaping the physical world to our ends” (p. 29309). Scientists (including archaeologists, economists, psychologists, neuroscientists) interested in human materiality have tended to focus on how we manipulate things according to our thought (motor cognition) or how we conceptualize our behaviour to transmit it to others (language, social cognition). However, little has been said on the intermediate level, that is, technical cognition. We added the term “technical cognition” here, which should help to make the connection more quickly.

      “Yet, little has been said about the intermediate-level cognitive processes that are directly involved in mastering this materiality, that is, technical cognition.” (p. 2)

      (2) The introduction could provide more details on why the issue of "generalizability and specificity" is important to address, to clarify the significance of the research question.

      We followed this comment and added a sentence to explain why it is important to address this research question. Again, we thank Reviewer 1 for their helpful comments.

      “Here we focus on two key aspects of the technical-reasoning hypothesis that remain to be addressed: Generalizability and specificity. If technical reasoning is a specific form of reasoning oriented towards the physical world, then it should be implicated in all (the generalizability question) and only (the specificity question) the situations in which we need to think about the physical properties of our world.” (p. 5)

      Reviewer #2 (Public Review):

      Summary:

      The goal of this project was to test the hypothesis that a common neuroanatomic substrate in the left inferior parietal lobule (area PF) underlies reasoning about the physical properties of actions and objects. Four functional MRI (fMRI) experiments were created to test this hypothesis. Group contrast maps were then obtained for each task, and overlap among the tasks was computed at the voxel level. The principal finding is that the left PF exhibited differentially greater BOLD response in tasks requiring participants to reason about the physical properties of actions and objects (referred to as technical reasoning). In contrast, there was no differential BOLD response in the left PF when participants engaged in fMRI variant of the Raven's progressive matrices to assess fluid cognition.

      Strengths:

      This is a well-written manuscript that builds from extensive prior work from this group mapping the brain areas and cognitive mechanisms underlying object manipulation, technical reasoning, and problem-solving. Major strengths of this manuscript include the use of control conditions to demonstrate there are differentially greater BOLD responses in area PF over and above the baseline condition of each task. Another strength is the demonstration that area PF is not responsive in tasks assessing fluid cognition - e.g., it may just be that PF responds to a greater extent in a harder condition relative to an easy condition of a task. The analysis of data from Task 3 rules out this alternative interpretation. The methods and analysis are sufficiently written for others to replicate the study, and the materials and code for data analysis are publicly available.

      We sincerely thank Reviewer 2 for their precious comments, which helped us improve the MS. 

      Weaknesses:

      The first weakness is that the conclusions of the manuscript rely on there being overlap among group-level contrast maps presented in Figure 2. The problem with this conclusion is that different participants engaged in different tasks. Never is an analysis performed to demonstrate that the PF region identified in e.g., participant 1 in Task 2 is the same PF region identified in Participant 1 in Task 4.

      We added new analyses that demonstrated that “the PF region identified in e.g., participant 1 in Task 2 is the same PF region identified in Participant 1 in Task 4”. We thank Reviewer 2 for this comment, because these new analyses reinforced our interpretation.

      “The conjunction analysis reported was subject to at least two key limitations that needed to be overcome to assure a correct interpretation of our findings. The first was that the tasks could recruit the same regions for different cognition functions (same-region-but-different-function interpretation). The second was that the activated regions of the different tasks could be adjacent but did not overlap at finer resolutions (adjacency interpretation). We tested the same-region-but-different-function interpretation by conducting additional ROI analyses, which consisted of correlating the specific activation of the left area PF (i.e., difference in terms of mean Blood-Oxygen Level Dependent [BOLD] parameter estimates between the experimental condition minus the control condition) in the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. This analysis did not include the mechanical problem-solving task because the sample of participants was not the same for this task. As shown in Fig. 5, we found significant correlations between all the tasks that were hypothesized as recruiting technical reasoning, i.e., the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .05). By contrast, no significant correlation was obtained between these three tasks and the fluid-cognition task (all p > .15). This finding invalidates the same-region-but-different-function interpretation by revealing a coherent pattern in the activation of the left area PF in situations in which participants were supposed to reason technically. We examined the adjacency interpretation by analysing the specific locations of individual peak activations within the left area PF ROI for the mechanical problemsolving task, the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. These peaks, which corresponded to the maximum value of activation obtained for each participant within the left area PF ROI, are reported in Fig. 6. As can be seen, the peaks of the fluid-cognition task were located more anteriorly, in the left area PFt (Parietal Ft) and the postcentral cortex, compared to the peaks of the other four tasks, which were more posterior, in the left area PF. Statistical analyses based on the y coordinates of the individual activation peaks confirmed this description (Fig. 6). Indeed, the y coordinates of the peaks of the mechanical problem-solving task, the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task were posterior to the y coordinates of the peaks of the fluid-cognition task (all p < .05), whereas no significant differences were reported between the four tasks (all p > .05). These findings speak against the adjacency interpretation by revealing that participants recruited the same part of the left area PF to perform tasks involving technical reasoning.” (p. 11-13)

      A second weakness is that there is a variance in accuracy between tasks that are not addressed. It is clear from the plots in the supplemental materials that some participants score below chance (~ 50%). This means that half (or more) of the fMRI trials of some participants are incorrect. The methods section does not mention how inaccurate trials were handled. Moreover, if 50% is chance, it suggests that some participants did not understand task instructions and were systematically selecting the incorrect item.

      It is true that the experimental conditions were more difficult than the control conditions, with some participants who performed at or below 50% in the experimental conditions. We added a section in the MS to stress this aspect. To examine whether this potential difficulty effect biased our interpretation, we conducted new ROI analyses by removing all the participants who performed at or below the chance level. These analyses revealed the same results as when no participant was excluded, suggesting that this did not bias our interpretation.

      “As mentioned above, the experimental conditions of all the tasks were more difficult than their control conditions. As a result, the specific activation of the left area PF documented above could simply reflect that this area responds to a greater extent in a harder condition relative to an easy condition of a task. This interpretation is nevertheless ruled out by the results obtained with the fluid-cognition task. We did not report a specific activation of the left area PF in this task while its experimental condition was more difficult than its control condition. To test more directly this effect of difficulty, we conducted new ROI analyses by removing all the participants who performed at or below 50% (Fig. S2). These new analyses replicated the initial analyses by showing a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .48). In sum, the ROI analyses corroborated the wholebrain analyses and ruled out the potential effect of difficulty.” (p. 11)

      A third weakness is related to the fluid cognition task. In the fMRI task developed here, the participant must press a left or right button to select between 2 rows of 3 stimuli while only one of the 3 stimuli is the correct target. This means that within a 10-second window, the participant must identify the pattern in the 3x3 grid and then separately discriminate among 6 possible shapes to find the matching stimulus. This is a hard task that is qualitatively different from the other tasks in terms of the content being manipulated and the time constraints.

      We acknowledge that the fluid-cognition task involved a design that differed from the other tasks. However, this was also true for the other tasks, as the design also differed between the mechanical problem-solving task, the psychotechnical task, and the mentalizing task. Nevertheless, despite these distinctions, we found a consistent activation of the left area PF in these tasks with different designs including in the psychotechnical task, which seemed as difficult as the fluid-cognition task.

      “Region of interest (ROI) results. We conducted additional analyses to test the robustness of our findings. One of our results was that we did not report any specific activation of the left area PF in the fluid-cognition task contrary to the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. However, this negative result needed exploration at the ROI level. Therefore, we created a spherical ROI of the left area PF with a radius of 12 mm in the MNI standard space (–59; –31; 40). This ROI was literature-defined to ensure the independence of its selection (40). ROI results are shown in Fig. 4. The analyses confirmed the results obtained with the whole-brain analyses by indicating a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .35).” (p. 10-11)

      In sum, this is an interesting study that tests a neuro-cognitive model whereby the left PF forms a key node in a network of brain regions supporting technical reasoning for tool and non-tool-based tasks. Localizing area PF at the level of single participants and managing variance in accuracy is critically important before testing the proposed hypotheses.

      We thank Reviewer 2 for this positive evaluation and their suggestions. As detailed in our response, our revision took into consideration both the localization of the left area PF at the level of single participants and the variance in accuracy. 

      Reviewer #2 (Recommendations For The Authors):

      Did the fMRI data undergo high-pass temporal filtering prior to modeling the effects of interest? Participants engaged in a long (17-24 minutes) run of fMRI data collection. Highpass filtering of the data is critically important when managing temporal autocorrelation in the fMRI response (e.g., see Shinn et al., 2023, Functional brain networks reflect spatial and temporal autocorrelation. Nature Neuroscience).

      Yes. We added this information.

      “Regressors of non-interest resulting from 3D head motion estimation (x, y, z translation and three axes of rotation) and a set of cosine regressors for high-pass filtering were added to the design matrix.” (p. 25-26)

      Including scales in Figure 2 would help the reader interpret the magnitude of the BOLD effects.

      We added this information in Figure 3 (Figure 2 in the initial version of the MS).

      It was difficult to inspect the small thumbnail images of the task stimuli in Figure 1. Higher resolution versions of those stimuli would help facilitate understanding of the task design and trial structure.

      We changed both Figure 1 and Figure S1.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript reports two neuroimaging experiments assessing commonalities and differences in activation loci across mechanical problem-solving, technical reasoning, fluid cognition, and "mentalizing" tasks. Each task includes a control task. Conjunction analyses are performed to identify regions in common across tasks. As Area PF (a part of the supramarginal gyrus of the inferior parietal lobe) is involved across 3 of the 4 tasks, the investigators claim that it is the hub of technical cognition.

      Strengths:

      The aim of finding commonalities and differences across related problem-solving tasks is a useful and interesting one.

      The experimental tasks themselves appear relatively well-thought-out, aside from the concern that they are differentially difficult.

      The imaging pipeline appears appropriate.

      We thank Reviewer 3 for their constructive comments, which helped us improve the MS.

      Weaknesses:

      (1) Methodological

      As indicated in the supplementary tables and figures, the experimental tasks employed differ markedly in 1) difficulty and 2) experimental trial time. Response latencies are not reported (but are of additional concern given the variance in difficulty). There is concern that at least some of the differences in activation patterns across tasks are the result of these fundamental differences in how hard various brain regions have to work to solve the tasks and/or how much of the trial epoch is actually consumed by "on-task" behavior. These difficulty issues should be controlled for by 1) separating correct and incorrect trials, and 2) for correct trials, entering response latency as a regressor in the Generalized Linear Models, 3) entering trial duration in the GLMs.

      We thank Reviewer 3 for this comment. It is true that the experimental conditions were more difficult than the control conditions, with some participants who performed at or below 50% in the experimental conditions. We added a section in the MS to stress this aspect. We could not conduct new analyses by separating correct and incorrect trials because, for each task, participants had to respond only on the last item of the block. Therefore, we did not record a response for each event. Nevertheless, we could examine whether this potential difficulty effect biased our interpretation, by conducting new ROI analyses in which we removed all the participants who performed at or below the chance level. These analyses revealed the same results as when no participant was excluded, suggesting that this did not bias our interpretation. 

      “As mentioned above, the experimental conditions of all the tasks were more difficult than their control conditions. As a result, the specific activation of the left area PF documented above could simply reflect that this area responds to a greater extent in a harder condition relative to an easy condition of a task. This interpretation is nevertheless ruled out by the results obtained with the fluid-cognition task. We did not report a specific activation of the left area PF in this task while its experimental condition was more difficult than its control condition. To test more directly this effect of difficulty, we conducted new ROI analyses by removing all the participants who performed at or below 50% (Fig. S2). These new analyses replicated the initial analyses by showing a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .48). In sum, the ROI analyses corroborated the wholebrain analyses and ruled out the potential effect of difficulty.” (p. 11)

      A related concern is that the control tasks also differ markedly in the degree to which they were easier and faster than their corresponding experimental task. Thus, some of the control tasks seem to control much better for difficulty and time on task than others. For example, the control task for the psychotechnical task simply requires the indication of which array contains a simple square shape (i.e., it is much easier than the psychotechnical task), whereas the control task for mechanical problem-solving requires mentally fitting a shape into a design, much like solving a jigsaw puzzle (i.e., it is only slightly easier than the experimental task).

      It is true that some control conditions could be easier than other ones. These differences reinforced the common activation found in the left area PF in the tasks hypothesized as involving technical reasoning, because this activation survived irrespective of the differences in terms of experimental design. For us, the rationale is the same as for a meta-analysis, in which we try to find what is common to a great variety of tasks. The only detrimental consequence we identified here is that this difference explained why we did not report a specific activation of the left area PF in the fluid-cognition task, as if the left area PF was more responsive when the task was difficult. This possibility assumes that the experimental condition of the fluid-cognition task is much more difficult than its control condition compared to what can be seen in the other tasks. As Reviewer 2 stressed in Point 1, this interpretation is unlikely, because the differences between the experimental and control conditions were similar to the fluid-cognition task in the mechanical problem-solving and psychotechnical tasks. In addition, again, the new ROI analyses in which we removed all the participants who performed at or below the chance level in expetimental conditions reproduced our initital results.

      (2) Theoretical 

      The investigators seem to overlook prior research that does not support their perspective and their writing seems to lack scientific objectivity in places. At times they over-reach in the claims that can be made based on the present data. Some claims need to be revised/softened.

      As this comment is also mentioned below, please find our response to it below.

      Reviewer #3 (Recommendations For The Authors):

      (1) Because of the high level of detail, Figures 1 and S2 (particularly the mentalizing task and mechanical problem-solving task, and their controls) are very hard to parse, even when examined relatively closely. It is suggested that these figures be broken down into separate panels for Experiment 1 and Experiment 2 to facilitate understanding.

      We changed both Figure 1 and Figure S1.

      (2) The behavioral data (including response latencies) should be reported in the main results section of the paper and not in a supplement.

      The behavioural data are now reported in the main results. We did not report response latencies because participants were not prompted to respond as quickly as possible.

      “Behavioural results. All the behavioural results are given in Fig. 2. As shown, scores were higher in the experimental conditions than for the control conditions for all the tasks (all p < .05). In other words, the experimental conditions were more difficult than the control conditions. This difference in terms of difficulty can also be illustrated by the fact that some participants performed at or below the chance level in the experimental conditions whereas none did so in the control conditions.” (p. 8)

      (3) The investigators seem to overlook prior research that does not support their perspective and their writing seems to lack scientific objectivity in places. At times they over-reach in the claims that can be made based on the present data. For example, claims that need to be revised/softened include:

      Abstract: "Area PF... can work along with social-cognitive skills to resolve day-to-day interactions that combine social and physical constraints". This statement is overly speculative.

      This statement is based on the fact that we reported a combined activation of the technical-reasoning network and the mentalizing network in the INT+PHYS condition of the mentalizing task. This suggests that both networks need to work together for solving a day-today problem in which both the physical constraints of the situation and the intention of the individual must be integrated. Our findings replicated previous ones with a similar task (e.g., Brunet et al. 2000; Völlm et al., 2006), in which the authors gave an interpretation similar to ours in considering that this task requires understanding physical and social causes. Perhaps that the reference to the results of the mentalizing task was not explicit enough. We added “dayto-day” before “problem” in the part of the discussion in which we discuss this possibility to make this aspect clearer.

      “In broad terms, the results of the mentalizing task indicate that causal reasoning has distinct forms and that it recruits distinct networks of the human brain (Social domain: Mentalizing; Physical domain: Technical reasoning), which can nevertheless interact together to solve day-to-day problems in which several domains are involved, such as in the INT+PHYS condition of the mentalizing task.” (p. 16)

      Introduction: "The manipulation-based approach... remains silent on the more general cognitive mechanisms...that must also encompass the use of unfamiliar or novel tools". This statement seems to be based on an overly selective literature review. There are a number of studies in which the relationship between a novel and familiar tool selection/use has been explored (e.g., Buchman & Randerath, 2017; Mizelle & Wheaton, 2010; Silveri & Ciccarelli, 2009; Stoll, Finkel et al., 2022; Foerster, 2023; Foerster, Borghi, & Goslin, 2020; Seidel, Rijntjes et al., 2023).

      We thank Reviewer 3 for this comment. Even if we accept the idea that we possess specific sensorimotor programs about tool manipulation, it remains that these programs cannot explain how an individual decides to bend a wire to make a hook or to pour water in a recipient to retrieve a target. As a matter of fact, such behaviour has been reported in nonhuman animals, such as crows (Weir et al., 2002, Nature) or orangutans (Mendes et al., 2007, Biology Letters). In these studies, the question is whether these nonhuman animals understand the physical causes or not, but the question of sensorimotor programs is never addressed (to our knowledge). This is also true in developmental studies on tool use (e.g., Beck et al., 2011, Cognition; Cutting et al., 2011, Journal of Experimental Child Psychology). This is what we meant here, that is, the manipulation-based approach is not equipped to explain how people solve physical problems by using or making tools – or any object – or by building constructions or producing technical innovations. However, we agree that some papers have been interested in exploring the link between common and novel tool use and have suggested that both could recruit common sensorimotor programs. It is noteworthy that these studies do not test the predictions from the manipulation-based approach versus the reasoning-based approach, so both interpretations are generally viable as stressed by Seidel et al. (2023), one of the papers recommended by Reviewer 3.

      “Apparently, the presentation of a graspable object that is recognizable as a tool is sufficient to provoke SMG activation, whether one tends to see the function of SMG to be either “technical reasoning” (Osiurak and Badets 2016; Reynaud et al. 2016; Lesourd et al. 2018; Reynaud et al. 2019) or “manipulation knowledge” (Sakreida et al. 2016; Buxbaum 2017; Garcea et al. 2019b).” (Seidel et al., 2023; p. 9)

      Regardless, as suggested by Reviewer 3, these papers deserve to be cited and this part needed to be rewritten to insist on the “making, construction, and innovation” dimension more than on the “unfamiliar and novel tool use” dimension to avoid any ambiguity.

      “This manipulation-based approach has provided interesting insights (12–16) and even elegant attempts to explain how these sensorimotor programs could support the use of both unfamiliar or novel tools (17–20), but remains silent on the more general cognitive mechanisms behind human technology that include the use of common and unfamiliar or novel tools but must also encompass tool making, construction behaviour, technical innovations, and transmission of technical content.” (p. 3)

      Introduction: "Here we focus on two important questions... to promote the technicalreasoning hypothesis as a comprehensive cognitive framework..."(italics added). This and other similar statements should be rewritten as testable scientific hypotheses rather than implying that the point of the research is to promote the investigators' preferred view.

      We agree that our phrasing could seem inappropriate here. What we meant here is that the technical-reasoning hypothesis could become an interesting framework for the study of the cognitive bases of human technology only if we are able to verify some of its key facets. As suggested, we rewrote this part. We also rewrote the abstract and the first paragraph of the discussion.

      “Here we focus on two key aspects of the technical-reasoning hypothesis that remain to be addressed: Generalizability and specificity. If technical reasoning is a specific form of reasoning oriented towards the physical world, then it should be implicated in all (the generalizability question) and only (the specificity question) the situations in which we need to think about the physical properties of our world.” (p. 5)

      Introduction: The Goldenberg and Hagmann paper cited actually shows that familiar tool use may be based either on retrieval from semantic memory or by inferring function from structure (mechanical problem solving); in other words, the investigators saw a role for both kinds of information, and the relationship between mechanical problem solving and familiar tool use was actually relatively weak. This requires correction.

      We disagree with Reviewer 3 on this point. The whole sentence is as follows:

      “This silence has been initially broken by a series of studies initiated by Goldenberg and Hagmann (9), which has documented a behavioural link in left brain-damaged patients between common tool use and the ability to solve mechanical problems by using and even sometimes making novel tools (e.g., extracting a target out from a box by bending a wire to create a hook) (9, 17).” (p. 3-4)

      We did not mention the interpretations given by Goldenberg and Hagmann about the link with the pantomime task, but only focused on the link they reported between common tool use and novel tool use. This is factual. In addition, we also disagree that the link between common tool use and novel tool use was weak.

      “The hypothesis put forward in the introduction predicts that knowledge about prototypical tool use assessed by pantomime of tool use and the ability to infer function from structure assessed by novel tool selection can both contribute to the use of familiar tools. Indeed results of both tests correlated signicantly with the use of familiar tools pantomime of tool use: r \= 0.77, novel tool selection: r \= 0.62; both P < 0.001), but there was also a signicant correlation between the two tests r \= 0.64, P < 0.001).” (Goldenberg & Hagmann, 1998; p. 585)

      As can be seen in this quote, they reported a significant correlation between novel tool selection and the use of familiar tools. It is also noteworthy that the novel tool selection test and the pantomime test correlated together. Georg Goldenberg told one of the authors (F. Osiurak; personal communication) that this result incited him to revise its idea that pantomime could assess “semantic knowledge”, which explains why he did not use it again as a measure of semantic knowledge. Instead, he preferred to use a classical semantic matching task in his 2009 Brain paper with Josef Spatt, in which they found a clearer dissociation between semantic knowledge and common/novel tool use not only at the behavioral level but also at the cerebral level.

      Introduction: Please expand and clarify this sentence "However, this involvement seems to be task-dependent, contrary to the systematic involvement of left are PF. The IFG and LOTC activations observed in prior studies are of interest as well. Were they indeed all taskdependent in these studies?

      We agree that this sentence is confusing. We meant that, in the studies reported just above in the paragraph, these regions were not systematically reported contrary to the left area PF. As we think that this information was not crucial for the logic of the paper, we preferred to remove it. 

      Introduction: If implicit mechanical knowledge is acquired through interactions with objects, how is that implicit knowledge conveyed to pass on the material culture to others?

      We thank Reviewer 3 for this comment. Although mechanical knowledge is implicit, it can be indirectly transmitted to other individuals, as shown in two papers we published in Nature Human Behaviour (Osiurak et al., 2021) and Science Advances (Osiurak et al., 2022). Actually, verbal teaching is not the only way to transmit information. There are many other ways of transmitting information such as gestural teaching (e.g., pointing the important aspects of a task to make them salient to the learner), observation without teaching (i.e., when we observe someone unbeknown to them) or reverse engineering (i.e., scrutinizing an artifact made by someone else). We have shown that even in reverse-engineering conditions, participants can benefit from what previous participants have done to increase their understanding of a physical system. In other words, all these forms of transmission allow the learners to understand new physical relationships without waiting that these relationships randomly occur in the environment. There is a wide literature on social learning, which describes very well how knowledge can be transmitted without using explicit communication. In fact, it is very likely that such forms of transmission were already present in our ancestors, allowing them to start accumulating knowledge without using symbolic language. We did not add this information in the MS because we think that this was a little bit beyond the scope of the MS. Nevetheless, we cited relevant literature on the topic to help the reader find it if interested in the topic.

      “Yet, recent accounts have proposed that non-social cognitive skills such as causal understanding or technical reasoning might have played a crucial role in cumulative technological culture (6, 29, 66). Support for these accounts comes from micro-society experiments, which have demonstrated that the improvement of technology over generations is accompanied by an increase in its understanding (67, 68), or that learners’ technical-reasoning skills are a good predictor of cumulative performance in such micro-societies (33, 69).” (p. 19)

      What distinguishes this implicit mechanical knowledge from stored knowledge about object manipulation? Are these two conceptualizations really demonstrably (testably) different?

      We agree that it is complex to distinguish between these two hypotheses as suggested by Seidel et al. (2023) cited above (see Reviewer 3 Point 8). We have conducted several studies to test the opposite predictions derived from each hypothesis. The main distinction concerns the understanding of physical materials and forces, which is central to the technical-reasoning hypothesis but not to the manipulation-based approach. Indeed, sensorimotor programs about tool manipulation are not assumed to contain information about physical materials and forces. In the present study, the understanding of physical materials and forces was needed in the four tasks hypothesized as requiring technical reasoning, i.e., the mechanical problem-solving task, the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task. We can illustrate this aspect with items of each of these tasks. Figure 1A is of the mechanical problem-solving task. 

      As explained in the MS, participants had memorized the five possible tools before the scanner session. Thus, for 4 seconds, they had to imagine which of these tools could be used to extract the target out from the box. We did so to incit them to reason about mechanical solutions based on the physical properties of the problem. Then, they had 3 seconds to select the tool with the appropriate shape, here the right one. In this case, the motor action remains the same (i.e., pulling). Another illustration can be given, with the psychotechnical task (Figure 1B).

      In this task, the participant had to reason as to whether the boat-tractor connection was better in the left picture or in the right picture. This needs to reason about physical forces, but there is no need to recruit sensorimotor programs about tool manipulation. Finally, a last example can be given with the PHYS-Only condition of the mentalizing task (but the logic is the same for the INT+PHYS condition except that the character’s intentions must also be taken into consideration) Figure 1D).

      Here the participant must reason about which picture shows what is physically possible. In this task, there is no need to recruit sensorimotor programs about tool manipulation. In sum, what is common between these three tasks is the requirement to reason about physical materials and forces. We do not ignore that motor actions could be simulated in the mechanical problemsolving task, but no motor action needed to be simulated in the other three tasks. Therefore, what was common between all these tasks was the potential involvement of technical reasoning but not of sensorimotor programs about tool manipulation. Of course, an alternative is to consider that motor actions are always needed in all the situations, including situations where no “manipulable tool” is presented, such as a tractor and a boat, a pulley, or a cannon. We cannot rule out this alternative, which is nevertheless, for us, prejudicial because it implies that it becomes difficult to test the manipulation-based approach as motor actions would be everywhere. We voluntarily decided not to introduce a debate between the reasoning-based approach and the manipulation-based approach and preferred a more positive writing by stressing the insights from the present study. Note that we stressed the merits of the manipulation-based approach in the introduction because we sincerely think that this approach has provided interesting insights. However, we voluntarily did not discuss the debate between the two approaches. Given Reviewer 3’s comment (see also Reviewer 1 Point 2), we understand and agree that some words must be nevertheless said to discuss how the manipulation-based approach could interpret our results, thus stressing the potential limitations of our interpretations. Therefore, we added a specific section in the discussion in which we discussed this aspect in more details.

      “The second limitation concerns the alternative interpretation that the left area PF is not central to technical reasoning but to the storage of sensorimotor programs about the prototypical manipulation of common tools. Here we show that the left area PF is recruited even in situations in which participants do not have to process common manipulable tools. For instance, some items of the psychotechnical task consisted of pictures of tractor, boat, pulley, or cannon. The fact that we found a common activation of the left area PF in such tasks as well as in the mechanical problem-solving task, in which participants could nevertheless simulate the motor actions of manipulating novel tools, indicates that this brain area is not central to tool manipulation but to physical understanding. That being said, some may suggest that viewing a boat or a cannon is enough to incite the simulation of motor actions, so our tasks were not equipped to distinguish between the manipulation-based approach and the reasoning-based approach. We have already shown that the left area PF is more involved in tasks that focus on the mechanical dimension of the tool-use action (e.g., the mechanical interaction between a tool and an object) than its motor dimension (i.e., the interaction between the tool and the effector [e.g., 24, 40]). Nevertheless, we recognize that future research is still needed to test the predictions derived from these two approaches.” (p. 18-19)

      Introduction and throughout: The framing of left Area PF as a special area for technical reasoning is overly reductionistic from a functional neuroanatomic perspective in that it ignores a large relevant literature showing that the region is involved with many other tasks that seem not to require anything like technical cognition. Indeed, entering the coordinates - 56, -29, 36 (reported as the peak coordinates in common across the studied tasks) in Neurosynth reveals that 59 imaging studies report activations within 3 mm of those coordinates; few are action-related (a brief review indicated studies of verbal creativity, texture processing, reading, somatosensory processing, stress reactions, attentional selection etc). Please acknowledge the difficulty of claiming that a large brain region should be labeled the brain's technical reasoning area when it seems to also participate in so much else. The left IPL (including area PF) is densely connected to the ventral premotor cortex, and this network is activated in language and calculation tasks as well as tool use tasks (e.g., Matsumoto, Nair, et al., 2012). What other constructs might be able to unite this disparate literature, and are any of these alternative constructs ruled out by the present data? Lacking this objective discussion, the manuscript does read as a promotion of the investigators' preferred viewpoint.

      We thank Reviewer 3 for this comment. As stressed in the initial version of the MS, we did not write that the left area PF is sufficient but central to the network that allows us to reason about the physical world. Regardless, we agree that an objective discussion was needed on this aspect to help the reader not misunderstand our purpose. We added a section in this aspect as suggested. 

      “Before concluding, we would like to point out two potential limitations of the present study. The first limitation concerns the fact that the literature has documented the recruitment of the left area PF in many neuroimaging experiments in which there was no need to reason about physical events (e.g., language tasks). This can be easily illustrated by entering the left area PF coordinates in the Neurosynth database.

      This finding could be enough to refute the idea that this brain area is specific to technical reasoning. Although this limitation deserves to be recognized, it is also true for many other findings. For instance, sensory or motor brain regions such as the precentral or the postcentral cortex have been found activated in many non-motor tasks, the visual word form area in non-language tasks, or the Heschl’s gyrus in nonmusical tasks. This remains a major challenge for scientists, the question being how to solve these inconsistencies that can result from statistical errors or stress that considerable effort is needed to understand the very functional nature of these brain areas. Thus, understanding that the left area PF is central to physical understanding can be viewed as a first essential step before discovering its fundamental function, as suggested by the functional polyhedral approach (56).” (p. 18)

      Discussion: The discussion of a small cluster in the IFG (pars opercularis) that nearly survived statistical correction is noteworthy in light of the above point. This further underscores the importance of discussing networks and not just single brain regions (such as area PF) when examining complex processes. The investigators note, "a plausible hypothesis is that the left IFG integrates the multiple constraints posed by the physical situation to set the ground for a correct reasoning process, such as it could be involved in syntactic language processing". In fact, the hypothesis that the IFG and SMG are together related to resolving competition has been previously proposed, as has the more specific hypothesis that the SMG buffers actions and that the context-appropriate action is then selected by the IFG (e.g., Buxbaum & Randerath, 2018). The parallels with the way the SMG is engaged with competing lexical or phonological alternatives (e.g., Peramunage, Blumstein et al., 2011) have also been previously noted.

      We added the Buxbaum and Randerath (2018)’s reference in this section.

      “The functional role of the left IFG in the context of tool use has been previously discussed (24) and a plausible hypothesis is that the left IFG integrates the multiple constraints posed by the physical situation to set the ground for a correct reasoning process, such as it could be involved in syntactic language processing (for a somewhat similar view, see [51]).” (p. 16-17)

      Introduction and Discussion: Please clarify how the technical reasoning network overlaps with or is distinct from the tool-use network reported by many previous investigators.

      We added a couple of sentences in the discussion to clarify this point.

      “It should be clear here that we do not advocate the localizationist position simply stating that activation in the left area PF is the necessary and sufficient condition for technical reasoning. We rather defend the view according to which it requires a network of interacting brain areas, one of them – and of major importance – being the left area PF. This allows the engagement of different configurations of cerebral areas in different technical-reasoning tasks, but with a central process acting as a stable component: The left area PF. Thus, when people intend to use physical tools, it can work in concert with brain regions specific to object manipulation and motor control, thereby forming another network, the tool-use network. It can also interact with brain regions specific to intentional gestures to form a “social-learning” network that allows people to enhance their understanding about the physical aspects of a technical task (e.g., the making of a tool) through communicative gestures such as pointing gestures (42). The major challenge for future research is to specify the nature of the cognitive process supported by the left area PF and that might be involved in the broad understanding of the physical world.” (p. 14)

      Discussion: All of the experimental tasks require a response from a difficult choice in an array, and all of the tasks except for the fluid cognition task are likely to require prediction or simulation of a motion trajectory-whether an embodied or disembodied trajectory is unclear. The Discussion does mention the related (but distinct) idea of an "intuitive physics engine", a "kind of simulator", Please clarify how this study can rule out these alternative interpretations of the data. If the study cannot rule out these alternatives, the claims of the study (and the paper title which labels PF as a technical cognition area) should be scaled back considerably. 

      We thank Reviewer 3 for this comment. The authors of the papers on intuitive physics engine or associative learning do not suggest that these processes are embodied. As discussed above, we clarified our perspective on the role of the left area PF and hope that these modifications help the reader better understand it. We warmly thank Reviewer 3 for their comments, which considerably helped us improve the MS.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Hüppe and colleagues had already developed an apparatus and an analytical approach to capture swimming activity rhythms in krill. In a previous manuscript they explained the system, and here they employ it to show a circadian clock, supplemented by exogenous light, produces an activity pattern consistent with "twilight" diel vertical migration (DVM; a peak at sunset, a midnight sink, and a peak in the latter half of the night).

      They used light:dark (LD) followed by dark:dark (DD) photoperiods at two times of the year to confirm the circadian clock, coupled with DD experiments at four times of year to show rhythmicity occurs throughout the year along with DVM in the wild population. The individual activity data show variability in the rhythmic response, which is expected. However, their results showed rhythmicity was sustained in DD throughout the year, although the amplitude decayed quickly. The interpretation of a weak clock is reasonable, and they provide a convincing justification for the adaptive nature of such a clock in a species that has a wide distributional range and experiences various photic environments. These data also show that exogenous light increases the activity response and can explain the morning activity bouts, with the circadian clock explaining the evening and late-night bouts. This acknowledgement that vertical migration can be driven by multiple proximate mechanisms is important.

      The work is rigorously done, and the interpretations are sound. I see no major weaknesses in the manuscript. Because a considerable amount of processing is required to extract and interpret the rhythmic signals (see Methods and previous AMAZE paper), it is informative to have the individual activity plots of krill as a gut check on the group data.

      The manuscript will be useful to the field as it provides an elegant example of looking for biological rhythms in a marine planktonic organism and disentangling the exogenous response from the endogenous one. Furthermore, as high latitude environments change, understanding how important organisms like krill have the potential to respond will become increasingly important. This work provides a solid behavioral dataset to complement the earlier molecular data suggestive of a circadian clock in this species.

      We appreciate the positive evaluation of our work by Reviewer 1, acknowledging our approach to record locomotor activity in krill and the importance of the findings in assessing krill’s potential to respond to environmental change in their habitat.

      Reviewer #2 (Public review):

      Summary:

      This manuscript provides experimental evidence on circadian behavioural cycles in Antarctic krill. The krill were obtained directly from krill fishing vessels and the experiments were carried out on board using an advanced incubation device capable of recording activity levels over a number of days. A number of different experiments were carried out where krill were first exposed to simulated light:dark (L:D) regimes for some days followed by continuous darkness (DD). These were carried out on krill collected during late autumn and late summer. A further set of experiments was performed on krill across three different seasons (summer, autumn, winter), where incubations were all DD conditions. Activity was measured as the frequency by which an infrared beam close to the top of the incubation tube was broken over unit time. Results showed that patterns of increased and decreased activity that appeared synchronised to the LD cycle persisted during the DD period. This was interpreted as evidence of the operation of an internal (endogenous) clock. The amplitude of the behavioural cycles decreased with time in DD, which further suggests that this clock is relatively weak. The authors argued that the existence of a weak endogenous clock is an adaptation to life at high latitudes since allowing the clock to be modulated by external (exogenous) factors is an advantage when there is a high degree of seasonality. This hypothesis is further supported by seasonal DD experiments which showed that the periodicity of high and low activity levels differed between seasons.

      Strengths

      Although there has been a lot of field observations of various circadian type behaviour in Antarctic krill, relatively few experimental studies have been published considering this behaviour in terms of circadian patterns of activity. Krill are not a model organism and obtaining them and incubating them in suitable conditions are both difficult undertakings. Furthermore, there is a need to consider what their natural circadian rhythms are without the overinfluence of laboratory-induced artefacts. For this reason alone, the setup of the present study is ideal to consider this aspect of krill biology. Furthermore, the equipment developed for measuring levels of activity is well-designed and likely to minimise artefacts.

      We would like to thank Reviewer 2 for their positive assessment of our approach to study the influence of the circadian clock on krill behavior. We are delighted, that Reviewer 2 found our mechanistic approach in understanding daily behavioral patterns of Antarctic krill using the AMAZE set-up convincing, and that the challenging circumstances of working with a polar, non-model species are acknowledged.

      Weaknesses

      I have little criticism of the rationale for carrying out this work, nor of the experimental design. Nevertheless, the manuscript would benefit from a clearer explanation of the experimental design, particularly aimed at readers not familiar with research into circadian rhythms. Furthermore, I have a more fundamental question about the relationship between levels of activity and DVM on which I will expand below. Finally, it was unclear how the observational results made here related to the molecular aspects considered in the Discussion.

      (1) Explanation of experimental design - I acknowledge that the format of this particular journal insists that the Results are the first section that follows the Introduction. This nevertheless presents a problem for the reader since many of the concepts and terms that would generally be in the Methods are yet to be explained to the reader. Hence, right from the start of the Results section, the reader is thrown into the detail of what happened during the LD-DD experiments without being fully aware of why this type of experiment was carried out in the first place. Even after reading the Methods, further explanation would have been helpful. Circadian cycle type research of this sort often entrains organisms to certain light cycles and then takes the light away to see if the cycle continues in complete darkness, but this critical piece of knowledge does not come until much later (e.g. lines 369-372) leaving the reader guessing until this point why the authors took the approach they did. I would suggest the following (1) that more effort is made in the Introduction to explain the exact LD/DD protocols adopted (2) that a schematic figure is placed early on in the manuscript where the protocol is explained including some logical flow charts of e.g. if behavioural cycle continues in DD then internal clock exists versus if cycle does not continue in DD, the exogenous cues dominate - followed by - major decrease in cyclic amplitude = weak clock versus minor decrease = strong clock and so on

      We want to thank Reviewer 2 for pointing out that the experimental design and its rationale are not becoming clear early in the manuscript, especially for people outside the field of chronobiology. We added a new figure (now Fig. 1), illustrating the basic principle of chronobiological study design and how we adopted it. We also extended the description at the beginning of the Results section to clarify the rationale behind the experimental design.

      (2) Activity vs kinesis - in this study, we are shown data that (i) krill have a circadian cycle - incubation experiments; (ii) that krill swarms display DVM in this region - echosounder data (although see my later point). My question here is regarding the relationship between what is being measured by the incubation experiments and the in situ swarm behaviour observations. The incubation experiments are essentially measuring the propensity of krill to swim upwards since it logs the number of times an individual (or group) break a beam towards the top of the incubation tube. I argue that krill may be still highly active in the rest of the tube but just do not swim close to the surface, so this approach may not be a good measure of "activity". Otherwise, I suggest a more correct term of what is being measured is the level of "upward kinesis". As the authors themselves note, krill are negatively buoyant and must always be active to remain pelagic. What changes over the day-night cycle is whether they decide to expend that activity on swimming upwards, downwards or remaining at the same depth. Explaining the pattern as upward kinesis then also explains by swarms move upwards during the night. Just being more active at night may not necessarily result in them swimming upwards.

      We believe there is a slight misunderstanding in how what we call “activity” is measured. The experimental columns are equipped with five detector modules, evenly distributed over the height of the column. In our analysis we count all beam breaks caused by upward movement, i.e. every time a detector module is triggered after a detector module at a lower position has been triggered, and not only when the top detector module is triggered. In this way, we record upward swimming movements throughout the column, and not only when the krill swims all the way to the top of the column. This still means that what we are measuring is swimming activity, caused by upward swimming. We use this measure, to deliberately separate increased swimming activity, from baseline activity (i.e. swimming, which solely compensates for negative buoyancy) and inactivity (i.e. passive sinking).

      Higher activity is thus at first interpreted as an increase in swimming activity, which in the field may result in upwards-directed swimming but also could mean a horizontal increase in activity, for example, representing increased foraging and feeding activity. This would explain the daily activity pattern observed under LD cycles (now Fig. 3), which shows a general increase in activity during the dark phase. This nighttime increase could be used for both upward directed migration during sunset and horizontal directed swimming for feeding and foraging throughout the night.

      We added the following sentence to the description of the activity metric in the Methods section to clarify this point (lines 465-469):

      “To accomplish this, we organized the raw beam break data from all five detector modules in each experimental column in chronological order. We selected only those beam break detections that occurred after a detection in the detector module positioned lower on the column. Like this, we consider upward swimming movements throughout the full height of the column.”

      (3) Molecular relevance - Although I am interested in molecular clock aspects behind these circadian rhythms, it was not made clear how the results of the present study allow any further insight into this. In lines 282 to 284, the findings of the study by Biscontin et al (2017) are discussed with regard to how TIM protein is degraded by light via the clock photreceptor CRYTOCHROME 1. This element of the Discussion would be a lot more relevant if the results of the present study were considered in terms of whether they supported or refuted this or any other molecular clock model. As it stands, this paragraph is purely background knowledge and a candidate for deletion in the interest of shortening the Discussion.

      We agree that this part is not directly related to the data presented in the manuscript. We, therefore, omitted this part in the revised version of the manuscript to keep the discussion concise and focused on the results.

      Other aspects

      (i) 'Bimodal swimming' was used in the Abstract and later in the text without the term being fully explained. I could interpret it to mean a number of things so some explanation is required before the term is introduced.

      We thank the Reviewer for pointing this out. We provided an explanation for the term “bimodal” in the Results section, where the two clock driven activity bouts are described first, by extending the sentence in lines 161-164, which now reads:

      “This suggests that the circadian clock drives a distinct bimodal activity pattern with two activity peaks in one day, i.e. the evening and late-night activity bouts, while. In contrast, the morning activity bout is triggered by the onset of illumination in the experimental set-up.”.

      (ii) Midnight sinking - I was struck by Figure 2b with regards to the dip in activity after the initial ascent, as well as the rise in activity predawn. Cushing (1951) Biol Rev 26: 158-192 describes the different phases of a DVM common to a number of marine organisms observed in situ where there is a period of midnight sinking following the initial dusk ascent and a dawn rise prior to dawn descent. Tarling et al (2002) observe midnight sinking pattern in Calanus finmarchicus and consider whether it is a response to feeding satiation or predation avoidance (i.e. exogenous factors). Evidence from the present study indicates that midnight sinking (and potential dawn rise) behaviour could alternatively be under endogenous control to a greater or lesser degree. This is something that should certainly be mentioned in the Discussion, possibly in place of the molecular discussion element mentioned above - possibly adding to the paragraph Lines 303-319.

      We would like to thank the Reviewer for pointing this out and agree that adding the idea of an endogenous control of midnight sinking would be interesting to the discussion. We added the following section to the Discussion (lines 335-343):

      “Interestingly, the decrease in clock-controlled swimming activity during the early night, right after the evening activity bout, may further facilitate a phenomenon called “midnight sinking”, which describes the sinking of animals to intermediate depths after the evening ascent, followed by a second rise to the surface before the morning descend. This behavior has been observed in a number of zooplankton species, including calanoid copepods (see 69, 70 and references therein) and krill (71). While previous studies suggested several exogenous factors, such as satiation or predator presence, as drivers of the midnight sink (69, 70), our study suggests that this pattern may be partly under endogenous control.”

      (iii) Lines 200-207 - I struggled to follow this argument regarding Piccolin et al identifying a 12 h rhythm whereas the present study indicates a ~24 h rhythm. Is one contradicting the other - please make this clear.

      In our study, we found that the circadian clock drives a bimodal pattern of swimming activity in krill, meaning it controls two bouts of activity in a 24-hour cycle. Piccolin et al. (2020) identified a swimming activity pattern of ~12 h (i.e. two peaks in 24 h) at the group level, which aligns with our findings at the individual level. We revised the Section in the discussion for more clarity, which now reads:

      “Data from Piccolin et al. (20) showed a strong damping of the amplitude and indication of a remarkably short (~12 h) free running period (FRP) of vertical swimming behavior of a group of krill under constant darkness (20). The short period found in Piccolin et al. (20) complements is in line with our findings of a bimodal activity pattern the pattern of swimming activity under DD conditions on the individual level found in the present study, suggesting that the ~12 h rhythm in group swimming behavior in Piccolin et al. (20) could have resulted from a bimodal activity pattern at the individual level, as found in our study.” (lines 212-219).  

      (iv) Although I agree that the hydroacoustic data should be included and is generally supportive of the results, I think that two further aspects should be made clear for context (a) whether there was any groundtruthing that the acoustic marks were indeed krill and not potentially some other group know to perform DVM such as myctophids (b) how representative were these patterns - I have a sense that they were heavily selected to show only ones with prominent DVM as opposed to other parts of the dataset where such a pattern was less clear - I am aware of a lot of krill research where DVM is not such a clear pattern and it is disingenuous to provide these patterns as the definitive way in which krill behaves. I ask this be made clear to the reader (note also that there is a suggestion of midnight sinking in Fig 5b on 28/2).

      To clarify the mentioned points concerning the hydroacoustic data:

      a) As mentioned in the Methods section, only hydroacoustic data during active fishing was included in the analysis. E. superba occurs in large monospecific aggregations, and the fishery actively targets E. superba and monitors their catch and the proportion of non-target species continuously with cameras. Krill fishery bycatch rates are very low (0.1–0.3%, Krafft et al. 2022), and fishing operations would stop if non-target species were caught in significant proportions at any time. Therefore, and supported by our own observations when we conducted the experiments, we argue that it is a valid assumption that E. superba predominantly causes the backscattering signal shown in Figure 5 (now Fig. 6).

      b) We are aware of the fact that DVM patterns of Antarctic krill are highly variable and that normal DVM patterns do not need to be the rule (e.g. see our cited study on the plasticity of krill DVM by Bahlburg et al. 2023). The visualized data were not selected for their DVM pattern but represent the period directly preceding the sampling for behavioral experiments in four seasons (experiment 2), including the day of sampling. These periods were chosen to assess the DVM behavior of krill swarms in the field in the days before and during the sampling for behavioral experiments.

      To improve understanding, we modified the description in the Results, Discussion, and Methods sections, as well as the caption of Figure 5 (now Fig. 6), which now read:

      “To investigate whether krill swarms exhibited daily behavioral patterns in swimming behavior in the field before they were sampled for seasonal experiments, hydroacoustic data were recorded from the fishing vessel, continuously over a three-day period prior to sampling for the seasonal experiments described above…” (lines 191-194).

      “Furthermore, hydroacoustic recordings demonstrate that most krill swarms sampled exhibited synchronized DVM in the field in the days directly before sampling for behavioral experiments, indicating that in this region, krill remain behaviorally synchronized across a wide range of photoperiods.” (lines 397-400).

      “Hydroacoustic data were collected using a hull-mounted SIMRAD ES80 echosounder (Kongsberg Maritime AS) aboard the Antarctic Endurance, covering three days before the sampling for each of the seasonal behavioral experiments of experiment 2” (lines 512-515).

      “We only included data during active fishing periods and the vessel is specifically targeting E. superba, which occurs in large monospecific aggregations. Further, krill fishery bycatch rates are very low (0.1-0.3%, 84), which makes it highly probable that the recorded signal represents krill swarms.” (lines 523-526).

      “Hydroacoustic recordings showing the vertical distribution of krill swarms in the upper water column (<220 m) below the vessel, visualized by the mean volume backscattering signal (200 kHz), on the three days prior to krill sampling for experiments…” (lines 802-804).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As noted in the public review, this is a logical and well-written manuscript. I have very few comments to consider addressing.

      The Results lead with a paragraph outlining the experimental approach. This is good, but you use the term "experiments" to refer to both the two sets, and the two or four subsets of experiments. Perhaps consider the subset experiments as "treatments"? I understood what you meant, but it took a few read-throughs to be sure I got it.

      We thank the reviewer for pointing this out and changed the nomenclature of the experiments throughout the manuscript. We now refer to the two sets of experiments as experiment 1 and 2, to the subsets of experiment 1 as “short day treatment” and “long day treatment”, and to the subsets of experiment 2 as summer treatment, late summer treatment, autumn treatment, and winter treatment. We also believe that the new Figure 1 is now helping to follow the experimental design more efficiently.

      Ln 140: "...off and decrease at lights-on."

      We adjusted the sentence accordingly.

      Ln 244: Can you define "extreme photic conditions"? I get what you mean, but to be clear to the reader this would help.

      We adjusted the sentence, which now reads:

      “This could confer a significant adaptive advantage to species inhabiting environments characterized by extreme photic conditions (53, 54, 60), such as phases of polar night or midnight sun as well as rapid changes in daylength, or species that rely on precise photoperiodic time measurement for accurate seasonal adaptation.” (lines 258-261).

      Figures: Consider adding an LSP for groups in Fig 1. Also, it would be useful to have LSP period estimates for each individual tested. This could be a separate table, or it could be added to the individual activity plots. Should S3 and S4 be reversed?

      We thank the reviewer for their suggestion and added an LSP as figure 1d (now Fig. 2d) to statistically support the group activity shown in Figure 1c (now Fig. 2c) as suggested. We added the individual animals' LSP period estimates to supplementary figures S2, S7, S8, S9, and S10. We also reversed Figures S3 and S4 to match the appearance in the main text. 

      Fig 5: are the light regime bars for b and c correct? They look similar, but there are only 15 days apart, so perhaps they are correct as is.

      We double checked the light regime bars in Fig. 5b and c (now 6b and c) and they are correct as is.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Kaya et al. studies the effect of food consumption on hippocampal sharp wave ripples (SWRs) in mice. The authors use multiple foods and forms of food delivery to show that the frequency and power of SWRs increase following food intake, and that this effect depends on the caloric content of food. The authors also studied the effects of the administration of various food-intake-related hormones on SWRs during sleep, demonstrating that ghrelin negatively affects SWR rate and power, but not GLP1, insulin, or leptin. Finally, the authors use fiber photometry to show that GABAergic neurons in the lateral hypothalamus, increase activity during a SWR event.

      Strengths:

      The experiments in this study seem to be well performed, and the data are well presented, visually. The data support the main conclusions of the manuscript that food intake enhances hippocampal SWRs. Taken together, this study is likely to be impactful to the study of the impact of feeding on sleep behavior, as well as the phenomena of hippocampal SWRs in metabolism.

      Weaknesses:

      Details of experiments are missing in the text and figure legends. Additionally, the writing of the manuscript could be improved.

      We thank the reviewer for their favorable assessment of the work and its potential impact. We have added all requested details in the text and figure legends and revised the wording of the manuscript to improve its clarity.

      Reviewer #2 (Public review):

      Summary:

      Kaya et al uncover an intriguing relationship between hippocampal sharp wave-ripple production and peripheral hormone exposure, food intake, and lateral hypothalamic function. These findings significantly expand our understanding of hippocampal function beyond mnemonic processes and point a direction for promising future research.

      Strengths:

      Some of the relationships observed in this paper are highly significant. In particular, the inverse relationship between GLP1/Leptin and Insulin/Ghrelin are particularly compelling as this aligns well with opposing hormone functions on satiety.

      Weaknesses:

      I would be curious if there were any measurable behavioral differences that occur with different hormone manipulations.

      We thank the reviewer for their favorable assessment of the work and its contribution to our understanding of non-mnemonic hippocampal function. Whether there are behavioral differences that occur following administration of the different hormones is a great question, yet unfortunately our study design did not include fine behavioral monitoring to the degree that would allow answering it. While some previous studies have partially addressed the behavioral consequences of the delivery of these hormones (and we reference these studies in our Discussion), how these changes may interact with the hippocampal and hypothalamic effects we observe is a very interesting next step.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Kaya et al. explores the effects of feeding on sharp wave-ripples (SWRs) in the hippocampus, which could reveal a better understanding of how metabolism is regulated by neural processes. Expanding on prior work that showed that SWRs trigger a decrease in peripheral glucose levels, the authors further tested the relationship between SWRs and meal consumption by recording LFPs from the dorsal CA1 region of the hippocampus before and after meal consumption. They found an increase in SWR magnitude during sleep after food intake, in both food restricted and ad libitum fed conditions. Using fiber photometry to detect GABAergic neuron activity in the lateral hypothalamus, they found increased activity locked to the onset of SWRs. They conclude that the animal's satiety state modulates the amplitude and rate of SWRs, and that SWRs modulate downstream circuits involved in regulating feeding. These experiments provide an important step forward in understanding how metabolism is regulated in the brain. However, currently, the paper lacks sufficient analyses to control for factors related to sleep quality and duration; adding these analyses would further support the claim that food intake itself, as opposed to sleep quality, is primarily responsible for changes in SWR activity. Adding this, along with some minor clarifications and edits, would lead to a compelling case for SWRs being modulated by a satiety state. The study will likely be of great interest in the field of learning and memory while carrying broader implications for understanding brain-body physiology.

      Strengths:

      The paper makes an innovative foray into the emerging field of brain-body research, asking how sharp wave-ripples are affected by metabolism and hunger. The authors use a variety of advanced techniques including LFP recordings and fiber photometry to answer this question. Additionally, they perform comprehensive and logical follow-up experiments to the initial food-restricted paradigm to account for deeper sleep following meal times and the difference between consumption of calories versus the experience of eating. These experiments lay the groundwork for future studies in this field, as the authors pose several follow-up questions regarding the role of metabolic hormones and downstream brain regions.

      We thank the reviewer for their appreciation and constructive review of the work.

      Weaknesses:

      Major comments:

      (1) The authors conclude that food intake regulates SWR power during sleep beyond the effect of food intake on sleep quality. Specifically, they made an attempt to control for the confounding effect of delta power on SWRs through a mediation analysis. However, a similar analysis is not presented for SWR rate. Moreover, this does not seem to be a sufficient control. One alternative way to address this confound would be to subsample the sleep data from the ad lib and food restricted conditions (or high calorie and low calorie, etc), to match the delta power in each condition. When periods of similar mean delta power (i.e. similar sleep quality) are matched between datasets, the authors can then determine if a significant effect on SWR amplitude and rate remains in the subsampled data.

      This is an important point that we believe we addressed in a few complementary ways. First, the mediation analysis we implemented measures the magnitude and significance of the contribution of food on SWR power after accounting for the effects of delta power, showing a highly significant food-SWR contribution. While the objective of subsampling is similar, mediation is a more statistically robust approach as it models the relationship between food, SWR power, and delta power in a way that explicitly accounts for the interdependence of these variables. Further, subsampling introduces the risk of losing statistical power by reducing the sample size, due to exclusion of data that might contain relevant and valuable information. Mediation analysis, on the other hand, uses the full dataset and retains statistical power while modeling the relationships between variables more holistically. However, as we were not satisfied with a purely analytical approach to test this issue, we carried out a new set of experiments in ad-libitum fed mice, where there is no concern of food restriction impairing sleep quality in the presleep session. In these conditions food amount also significantly correlated with, and showed significant mediation of, the SWR power change. Finally, we acknowledge and discuss this point in the Discussion, highlighting that given the known relationship between cortical delta and SWRs, it is challenging to fully disentangle these signals. 

      (2) Relatedly, are the animals spending the same amount of time sleeping in the ad lib vs. food restricted conditions? The amount of time spent sleeping could affect the probability of entering certain stages of sleep and thus affect SWR properties. A recent paper (Giri et al., Nature, 2024) demonstrated that sleep deprivation can alter the magnitude and frequency of SWRs. Could the authors quantify sleep quantity and control for the amount of time spent sleeping by subsampling the data, similar to the suggestion above?

      Following the reviewer’s comment, we have quantified and compared the amount of time spent in NREM sleep in the Pre and Post session pairs in which the animals were food restricted, with 0-1.5 g of chow given between the sleep sessions. We found that there was no significant difference in the amount of time spent in NREM sleep in the Pre and Post sessions. We have added this result to the Results section of the manuscript and as a new Supplementary Fig. 1. 

      Additionally, we have added details to the Methods section that were missing in the original submission that are relevant to this point. Specifically, within the sleep sessions, the ongoing sleep states were scored using the AccuSleep toolbox (https://github.com/zekebarger/AccuSleep) using the EEG and EMG signals. NREM periods were detected based on high EEG delta power and low EMG power, REM periods were detected based on high EEG theta power and low EMG power, and Wake periods were detected based on high EMG power. Importantly, only NREM periods were included for subsequent SWR detection, quantification and analyses (in particular, reported SWR rates reflect the number of SWRs per second of NREM sleep). 

      (3) Plot 5I only reports significance but does not clearly show the underlying quantification of LH GABAergic activity. Upon reading the methods for how this analysis was conducted, it would be informative to see a plot of the pre-SWR and post-SWR integral values used for the paired t-test whose p-values are currently shown. For example, these values could be displayed as individual points overlaid on a pair of boxand-whisker plots of the pre- and post-distribution within the session (perhaps for one example session per mouse with the p-value reported, to supplement a plot of the distribution of p-values across sessions and mice). If these data are non-normal, the authors should also use a non-parametric statistical test.

      We have generated the summary plots the reviewer requested and have now included them in Supplementary Fig. 2. 

      Minor comments:

      (4) A brief explanation (perhaps in the discussion) of what each change in SWR property (magnitude, rate, duration) could indicate in the context of the hypothesis may be helpful in bridging the fields of metabolism and memory. For example, by describing the hypothesized mechanistic consequence of each change, could the authors speculate on why ripple rate may not increase in all the instances where ripple power increases after feeding? Why do the authors speculate that ripple duration does not increase, given that prior work (Fernandez-Ruiz et al. 2019) has shown that prolonged ripples support enhanced memory?

      This is an interesting point and we have added a section to the Discussion to discuss it (pg. 17, last paragraph)

      (5) The authors suggest that "SWRs could modulate peripheral metabolism" as a future implication of their work. However, the lack of clear effects from GLP-1, leptin and insulin complicates this interpretation. It might be informative for readers if the authors expanded their discussion of what specific role they speculate that SWRs could play in regulating metabolism, given these negative results.

      We have added a section to the Discussion proposing potential reasons for this point (pg. 16, last paragraph)

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      Major Comments:

      (1) The experiments involve very precise windows of time for sleeping and eating that seem impossible to control. For example, the authors state that for the experiments in Figure 1, there was a 2-h sleep period, followed by a 1-h feeding period, followed by another 2-h sleep period. Without sleep deprivation procedures or other environmental manipulations, how can these periods be so well-defined? Even during the inactive period, mice typically don't sleep for 2-h bouts at once, and the addition of food would not likely lead to an exact 1-h period of wakefulness in the middle. The validity of these experimental times would be more believable if the authors provided much more data on these sessions. For example, the authors could provide a table or visual display of data for the actual timing of the pre-sleep, eating, and post-sleep phases with exact time measurements and/or visual display of sleep versus wakefulness.

      This is an important point, which we were not clear enough about in the original submission. While the durations of the Pre-sleep, Wake and Post-sleep sessions were indeed 2 h, 1 h and 2 h respectively, the animals did not actually sleep during the entirety of the sleep sessions. Importantly, we performed sleep state scoring on all sessions, and only analyzed identified NREM sleep for all SWR analyses. Following the reviewer’s comment (and that of Reviewer 1), we have quantified and compared the amount of time spent in NREM sleep in the Pre and Post session pairs in which the animals were food restricted and 0-1.5 g of chow were given between the sleep sessions. We found that there was no significant difference in the amount of time spent in NREM sleep in the Pre and Post sessions. We have added this result to the Results section of the manuscript and as a new Supplementary Fig. 1. 

      Additionally, we have added details to the Methods section that were missing in the original submission that are relevant to this point. Specifically, within the sleep sessions, the ongoing sleep states were scored using the AccuSleep toolbox (https://github.com/zekebarger/AccuSleep) using the EEG and EMG signals. NREM periods were detected based on high EEG delta power and low EMG power, REM periods were detected based on high EEG theta power and low EMG power, and Wake periods were detected based on high EMG power. Importantly, only NREM periods were included for subsequent SWR detection, quantification and analyses (in particular, reported SWR rates reflect the number of SWRs per second of NREM sleep). 

      (2) I may have missed this (although I tried searching in the text and figure legend), but the authors did not state the difference between green versus red bar colors in Figure 1 C-E. For Figures 1 F-J, do the individual dots represent both the test (fed) animals and control animals, or just the test animals?

      We thank the reviewer for the opportunity to clarify these points. Red bars in Fig. 1C-E represent the SWR changes observed following delivery of equal or more than 0.5 g of chow, while the green bars represent the changes observed following delivery of less than 0.5 g. Fig. 1F-J includes both the experimental and control animals- the control animals appearing as having received 0 food amount. This information has now been added to the figure legend.

      (3) For the jello experiments in Figure 3, was there only 1 trial per animal? Previous studies show that animals learn the caloric value of jello after subsequent trials, so whether or not multiple trials took place in each animal is important for interpretation of the results.

      In Figure 3, the datapoints within each panel represent different animals and this information has now been added to the figure legend. Nevertheless, the animals were previously habituated to all foods, including regular jello, sugar-free jello and chocolate. While we consider it unlikely that this prior experience was sufficient to underlie the differential effects on SWRs, we cannot fully rule out the possibility that it provided some ability to predict the caloric value and consequences of the different foods. We have added details to the acknowledgement of this point in the Discussion (pg. 17, second paragraph).

      (4) The experiments in Figure 5 are informative but don't relate to the experiments in the rest of the study. It is difficult to interpret their meaning given that these experiments take place over seconds while the other experiments take place over hours. Some attempt should be made to bridge these experiments over the timescales relevant for the behaviors studied in Figures 1-4.

      We have now further acknowledged and discussed the point that our investigation is limited to the timescale of seconds around SWRs, and thus identified a potential communication channel, but whether and how this communication changes across hours following feeding remains for future studies (pg. 18, second paragraph).

      (5) Figure 5B should depict the x-axis in seconds, not an arbitrary set of times from a recording.

      We have replaced these with a time scale bar.

      Minor Comments:

      (6) The writing of the manuscript can be improved in many places:

      Sometimes the writing could be more precise. For example, the Abstract states: "hippocampal sharp wave ripples (SWRs)... have been shown to influence peripheral glucose metabolism." Could this be written in a more informative way, rather than just staying "has been shown to influence?" A few more words would provide a lot more information. Similarly, at the end of the Introduction: "we set out to test the hypothesis that SWRs are modulated following meal times as part of the systems-level response to changing metabolic needs." This is not a strong hypothesis... could it be written to boldly state how the SWRs will be modulated (increase or decrease) and provide more assertive information?

      The writing can be grandiose at times. Phrases such as "life is a continuous journey" or "the hypothalamus is a master regulator of homeostasis" are a bit sophomoric and too colloquial.

      Finally, a representative recording should be referred to as just that-a "representative recording," as opposed to a "snippet," which is also colloquial. This word is used in the figure legends to Figures 1 and 5, and misspelled as "sinpper" in Figure 1

      We have reworded all these sentences and phrases to make them clearer, more concrete and more formal.

      (7) The methods state that the study used both male and female mice. Were they used in equal numbers across experiments?

      Only one female was used in the final dataset, and we have corrected the wording accordingly.

      Reviewer #2 (Recommendations for the authors):

      Great paper!

      Thanks!

      Reviewer #3 (Recommendations for the authors):

      Below are some minor requests for clarification, including in figures:

      (1) Fig. 5H y-axis should say "normalized dF/F."

      Done

      (2) Fig. 1B is missing a y-axis label. It may be clearer to display separate y-axis scale bars for each component (SWR envelope, ripple-filtered amplitude, etc).

      Done

      (3) Please include labels for brain areas and methodological components in Fig. 5A.

      Done

      (4) Should Fig. 5B have the same y-axis or scale bars as 1B?

      We have edited the figure labels and legends to be visually similar

      (5) In Fig. 5J, is the y-axis a count of sessions?

      Yes, we have added that to the y-axis label

      (6) Could the authors please clarify whether the sugar-free jello was sweetened with an artificial sweetener? If so, this is a robust control for the rewarding nature of the two jellos, so a quick clarification would highlight this strength of the experiment.

      We thank the reviewer for this great point. Indeed, the sugar free jello contained artificial sweeteners (Aspartame and Acesulfame Potassium). We have added this information to the Results and Methods.

      (7) It appears in Fig. 5 that there may be a reliable dip in activity **at** the time of SWR onset, followed by the increase afterward, as shown in the example FP trace and the individual ripple-triggered traces. Is this indeed the case, and does this dip fall significantly below baseline? This characterization would be interesting, but I acknowledge is not necessarily crucial to the study to include.

      This would indeed be an interesting finding, but upon examination and statistical testing, we found that this is not the case. We believe this may appear as such due to the normalization of the traces.

      (8) The authors mention a reduction in ripple rate following insulin under food restriction as the only significant effect for insulin, GLP-1, and leptin, yet there was also a significant increase (at p<0.05) in ripple duration for GLP-1 in the ab lib condition. Is this not considered noteworthy?

      This is a fair point and we have reworded the description of this result to simply state that there were no robust, consistent, dose-dependent effects of GLP-1, leptin and insulin on SWR attributes.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This study presents evidence that a special group of place cells, those tuned to fast-gamma oscillations, play a key role in theta sequence development. How theta sequences are formed and developed during experience is an important question, because these sequences have been implicated in several cognitive functions of place cells, including memory-guided spatial navigation. The revised version of this paper has been significantly improved. Major concerns in the previous round of review on technical and conceptual aspects of the relationship between gamma oscillations and theta sequences are addressed. The main conclusion is supported by the data presented.

      Reviewer #2 (Public review):

      The authors have conducted new analysis to address the issues I and the other reviewers raised in our original revision. As a result, the revised manuscript has been substantially improved.

      We thank the two reviewers for their positive comments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      There are, however, still a few remaining issues that need further clarification.

      - Despite the authors explanation and comparison with Kitanishi et al., 2015, Neuron, I still find that the reduced number of significantly gamma phase-locked cells is at odds with most previous reports (e.g., Csicvari et al., 2003; Colgin et al., 2009; Belluscio et al., 2012; Schomburg et al., 2014; Cabral et al., 2014; Fernandez-Ruiz et al., 2017; Lopes dos Santos et al., 2018). There can be several issues to explain this difference, like the choice of LFP reference channel. The authors should at least acknowledge this difference in the text.

      We thank the reviewer for this suggestion.  We discussed the potential reasons causing the different proportion of gamma phase locked cells in the Discussion (lines367-380).

      - The new Figure R2 is very useful and should be included in the manuscript. It would be even better to expand the frequency range to higher frequencies to show where the maximum peak is. Still, the potential contribution of spike leakage should be acknowledged. While I agree that it will not account for all fast gamma spike modulation, it is certainly a contributing factor. A further evidence of this is that the coupling strength seems to keep increasing towards supra gamma frequency range in Fig R2. This is to be expected given that the authors have used the local LFP from the same tetrode where cells were recorded, which is never a good practice.

      We thank the reviewer for this suggestion. Now the Fig R2 has been moved to the manuscript as a part of Figure 2-figure supplement 2 (lines133-135). In terms of the contribution of spike leakage by using the local LFP, we also detected FG-cells by using LFP from a different tetrode, i.e. the central one of the bundle that located in the cell body layer, and found approximate proportion of FG-cells which phase locked to ~75Hz (Fig R3, now the Figure 2-figure supplement 2C-F). Thus, we think using the local LFP would not affect the main conclusion and we decide to keep the original results. We also acknowledged the potential contribution of spike leakage in the Discussion (lines 372-377).

      - From the authors answer I understand that recordings were almost exclusively conducted from the deep CA1 pyramidal layer. This would preclude any meaningful interpretation of the deep/ superficial differences in the distribution of FG and NFG cells. This is not a crucial point for the paper but needs to be acknowledged.

      We thank the reviewer for this suggestion.  We acknowledged the meaningful interpretation of the deep/ superficial differences in the distribution of FG- and NFG-cells in the Discussion (lines 380-386).

      - I am afraid that the authors interpreted my comment about authorship in the opposite way that I intended. I meant that the usual practice is that the last author of the manuscript is the person who has been the main intellectual driver of the work, not the most senior one necessarily. I guess that is Dr. Zheng not Dr. Ming. However, I leave this decision to the discretion of the authors.

      We thank the reviewer for this rigorous consideration.  Dr. Ming and Dr. Zheng were both the main intellectual drivers of this work.  Therefore, we decide to keep the current authors in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      This very interesting manuscript proposes a general mechanism for how activating signaling proteins respond to species-specific signals arising from a variety of stresses. In brief, the authors propose that the activating signal alters the structure by a universal allosteric mechanism.

      Strengths:

      The unitary mechanism proposed is appealing and testable. They propose that the allosteric module consists of crossed alpha-helical linkers with similar architecture and that their attached regulatory domains connect to phosphatases or other molecules through coiled-coli domains, such that the signal is transduced via rigidifying the alpha helices, permitting downstream enzymatic activity. The authors present genetic and structural prediction data in favor of the model for the system they are studying, and stronger structural data in other systems.

      Weaknesses:

      The evidence is indirect - targeted mutations, structural predictions, and biochemical data. Therefore, these important generalizable conclusions are not buttressed by impeccable data, which would require doing actual structures in B. subtilis, confirming experiments in other organisms, and possibly co-evolutionary coupling. In the absence of such data, it is not possible to rule out variant models.

      We thank the reviewer for their feedback. A challenge of studying flexible proteins is that it is often not possible to directly obtain high resolution structural data. For the case of B. subtilis RsbU, the independent experimental approaches we applied (including two unbiased genetic screens, targeted mutagenesis, SAXS, enzymology, and structure prediction, which includes evolutionary coupling) converged upon a model for activation, which we feel is well supported. Frustratingly, our attempts at determining high resolution experimental structures have been unsuccessful, which we think is due to the flexibility of the proteins revealed by our SAXS experiments. For example, we collected X-ray diffraction data from crystals of a fragment of B. subtilis RsbU containing the N-terminal domain and linker in which the linker was almost entirely disordered in the maps. We agree that doing experiments in other organisms would be valuable next steps to test the hypothesis that this coiled-coil based transduction mechanism is conserved across species, and will modify the text to differentiate this more speculative section of the manuscript.

      We have modified the abstract to read:

      “This coiled-coil linker transduction mechanism additionally suggests a resolution to the mystery of how shared sensory domains control serine/threonine phosphatases, diguanylate cyclases and histidine kinases.”

      We have modified the results to read:

      "These predictions suggest a testable hypothesis that RsbP is controlled through an activation mechanism similar to that of RsbU (Fig. 5A)”

      “From this analysis, we speculate that linker-mediated phosphatase domain dimerization is an evolutionarily conserved, adaptable mechanism to control PPM phosphatase activity.”

      Based on this critique (and the critiques of the other reviewers), we plan to do energetic analysis of the predicted coiled coils from the enzymes we analyzed from other species and to incorporate this into the manuscript.

      We have modified the results to read:

      Consistent with a model in which the stability of the linker plays a conserved regulatory role, the AlphaFold2 models for many of the predicted structures have unfavorable polar residues buried in the coiled-coil interface (positions a and d, for which non-polar residues are most favorable) (Figure 5 – figure supplement 2).”

      Finally, in the manuscript, we have highlighted that this mechanism is not the only mechanism for activation of other proteins with effector domains connected to linkers, but rather one of many mechanisms (Fig 5G). The reviewer additionally made helpful suggestions about the text in detailed comments that we will incorporate as appropriate.

      Reviewer #2 (Public review):

      Summary:

      While bacteria have the ability to induce genes in response to specific stresses, they also use the General Stress Response (GSR) to deal with growth conditions that presumably include a larger range of stresses (for instance, stationary phase growth). The activation of GSR-specific sigma factors is frequently at the heart of the induction of a GSR. Given the range of stresses that can lead to GSR induction, the regulatory inputs are frequently complex. In B. subtilis, the stressosome, a multi-protein complex, contains a set of proteins that, upon appropriate stresses, initiate partner switching cascades that free the sigma B sigma factor from an anti-sigma. The focus here is on the mode of activation of RsbU, a serine/threonine phosphatase of the PPM family, leading to sigB activation. RbsT, a component of the degradosome interacts with RsbU upon stress, activating the phosphatase activity. Once active, RsbU dephosphorylates its target (RsbV, an anti-antisigma), which in turn binds the anti-sigma. The conclusion is that flexible linker domains upstream of the phosphatase domain are the target for activation, via binding of proteins to the N-terminal domain, resulting in a crossed-linker dimeric structure. The authors then use the information on RsbU to suggest that parallel approaches are used to activate PPM phosphatases for the GSR response in other bacteria. (Biology vs. Mechanism, evolution?)

      Strengths and Weaknesses:

      Many of these have to do with clarifying what was done and why. This includes the presentation and content of the figures.

      One issue relates to the background and context. A bit more information on the stresses that release RsbT would be useful here. The authors might also consider a figure showing the major conclusions and parallels for SpoIIE activation and possibly other partner switches that are discussed, introducing the switch change more clearly to set the stage for the work here (and the generalization). There are a lot of players to keep track of.

      We plan to carefully review the manuscript to improve the clarity of presentation and background. In particular, we thank the reviewer for pointing out the missing information about the release of RsbT from the stressosome. We will incorporate this information into the introduction and provide an additional figure.

      We have added the following text to the introduction:

      “RsbT is sequestered in a megadalton stress sensing complex called the stressosome, and is released to bind RsbU in response to specific stress signals including ethanol, heat, acid, salt, and blue light”

      We have added a new figure panel (2C) that shows the model for how Q94L, M166V, and RsbT binding induce conformational change of the PPM domain to recruit metal cofactor and activate RsbU (analogous, but slightly different from the mechanism for SpoIIE).

      The reviewer additionally provided detailed helpful comments that we will incorporate in the text and figures.

      Reviewer #3 (Public review):

      Summary:

      The authors present a study building on their previous work on activation of the general stress response phosphatase, RsbU, from Bacillus subtilis. Using computed structural models of the RsbU dimer the authors map previously identified activating mutations onto the structure and suggest further protein variants to test the role of the predicted linker helix and the interaction with RsbT on the activation of the phosphatase activity.

      Using in vivo and in vitro activity assays, the authors demonstrate that linker variants can constitutively activate RsbU and increase the affinity of the protein for RsbT, thus showing a link between the structure of the linker region and RsbT binding.

      Small angle X-ray scattering experiments on RsbU variants alone, and in complex with RsbT show structural changes consistent with a decreased flexibility of the RsbU protein, which is hypothesised to indicate a disorder-order transition in the linker when RsbT binds. This interpretation of the data is consistent with the biochemical data presented by the authors.

      Further computed structure models are presented for other protein phosphates from different bacterial species and the authors propose a model for phosphatase activation by partner binding. They compare this to the activation mechanisms proposed for histidine kinase two-component systems and GGDEF proteins and suggest the individual domains could be swapped to give a toolkit of modular parts for bacterial signalling.

      Strengths:

      The key mutagenesis data is presented with two lines of evidence to demonstrate RsbU activation - in vivo sigma-b activation assays utilising a beta-galactosidase reporter and in vitro activity assays against the RsbV protein, which is the downstream target of RsbU. These data support the hypothesis for RsbT binding to the RsbU linker region as well as the dimerisation domain to activate the RsbU activity.

      Weaknesses:

      Small angle scattering curves are difficult to unambiguously interpret, but the authors present reasonable interpretations that fit with the biochemical data presented. These interpretations should be considered as good models for future testing with other methods - hydrogen/deuterium exchange mass spectrometry, would be a good additional method to use, as exchange rates in the linker region would be affected significantly by the disorder/order transition on RsbT binding.

      We agree with the reviewer that the SAXS data has inherent ambiguity due to the nature of the measurement. However, SAXS is one of the best techniques to directly assess conformational flexibility. Our scattering data for RsbU have multiple signatures of flexibility supporting a high confidence conclusion. While the scattering data support a reduction in flexibility for the RsbT/RsbU complex, we agree that a high resolution structure would be valuable. However the combination of the scattering data with our biochemical and genetic data supports the validity of the AlphaFold predicted model. We thank the reviewer for the suggestion of future hydrogen/deuterium exchange experiments that would be complementary, but which we feel are beyond the scope of this work.

      The interpretation of the computed structure models should be toned down with the addition of a few caveats related to the bias in the models returned by AlphaFold2. For the full-length models of RsbU and other phosphatase proteins, the relationship of the domains to each other is likely to be the least reliable part of the models - this is apparent from the PAE plots shown in Supplementary Figure 8. Furthermore, the authors should show models coloured by pLDDT scores in an additional supplementary figure to help the reader interpret the confidence level of the predicted structures.

      We thank the reviewer for suggestions on how to clarify the discussion of AlphaFold models. We will decrease the emphasis on the computed models in the text and will add figures with the models colored by the pLDDT scores to aid in the interpretation.

      We have modified the text of the Abstract: “This coiled-coil linker transduction mechanism additionally suggests a resolution to the mystery of how shared sensory domains control serine/threonine phosphatases, diguanylate cyclases and histidine kinases.”

      We have modified the text of the Results: “These predictions suggest a testable hypothesis that RsbP is controlled through an activation mechanism similar to that of RsbU (Fig. 5A).”

      “From this analysis, we speculate that linker-mediated phosphatase domain dimerization is an evolutionarily conserved, adaptable mechanism to control PPM phosphatase activity”

      We have also added Figure 1 – figure supplement 2 with the AlphaFold2 models colored by the pLDDT scores.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Baral and colleagues investigate the regulatory mechanisms of the General Stress Response (GSR) in Bacillus subtilis, focusing on the phosphatase RsbU and its regulation by the protein RsbT. The GSR is a critical adaptive mechanism that allows bacteria to survive under various stress conditions by reshaping their physiology through a broad transcriptional response. RsbU, a key player in the GSR, facilitates the activation of the transcription factor SigB by dephosphorylating RsbV. This activation is mediated through a partner-switching mechanism involving RsbT. Baral and colleagues use a combination of genetic screening, structural predictions via AlphaFold2, and biophysical techniques such as SAXS and MALS to present a model for how RsbT regulates RsbU. Key findings include the identification of specific amino acid substitutions that enhance RsbU activity, the role of the α-helical linker in RsbU dimerization and activation, and the potential broader conservation of these mechanisms across bacterial species. However, as described below, additional work is required to solidify the results.

      Major Points

      (1) The manuscript is misnamed--it dissects a single step of the signal-transduction pathway regulating the general stress response. Instead, it is rather seeking a generalizable mechanism for kinase -phosphatase interactions across stresses.

      We have edited the title to “A General Mechanism for Initiating the General Stress Response in Bacteria” to reflect that that this study addresses the initiating event of the general stress response.

      (2) The genetic screen likely has limitations in detecting all possible variants that could affect RsbU activity. The readout is specific to σ^B activation, and the focus on specific amino acid substitutions may overlook other significant regions or mechanisms involved in the regulation of RsbU, particularly those involving RsbV and RsbT.

      Our screens were specifically designed to identify features of RsbU that contribute to regulation. Importantly, RsbU does not have any known targets other than RsbV and the downstream σ<sup>B</sup> response but agree that substitutions in either RsbV or RsbT could influence RsbU activation. In principle our suppressor screen with RsbU<sup>Y28I</sup> could have identified RsbT variants (rsbT was mutagenized in this screen), but we did not identify any such variants in the screen. We conducted a separate screen (published elsewhere) that specifically addressed how RsbU recognizes RsbV.

      (3) The authors largely focus on the biochemical and structural aspects of RsbU regulation. There is limited discussion on the broader functional implications of these findings in the context of bacterial physiology and stress response. Incorporating more in vivo studies to show how these mechanisms impact bacterial survival and adaptation would provide a more comprehensive understanding.

      We appreciate this comment, but did not conduct additional studies of survival and adaptation because the phenotypes of σ<sup>B</sup> deletion in B. subtilis under laboratory conditions are relatively mild and therefore difficult to assay. Future studies to address this in other systems could be highly informative.

      (4) The results primarily support the model of linker-mediated dimerization and rigidity. However, other potential regulatory mechanisms or interacting partners might also play significant roles in RsbU activation. A more thorough exploration of these possibilities would strengthen the study's conclusions.

      One of the major advantages of RsbU as a model for initiation of the general stress response is that the system is discreet with all evidence pointing to there being a single primary input (RsbT) and output (dephosphorylation of RsbV). While there are other possible variations on the system (for example RsbU may be directly activated by manganese stress), we focused on this system precisely because of its simplicity.

      (5) While the study presents evidence for the conservation of the described mechanism across different species, this assumption is based on structural predictions and limited experimental data. Broader experimental validation across diverse bacterial species would be necessary to substantiate this claim. Coevolution coupling along with conservation/evolutionary studies could be considered.

      We have altered the language in the paper to emphasize where we are making inferences from predictions that are therefore more speculative. We agree that a more detailed analysis of the evolutionary coupling would likely be fruitful. We note that these couplings are the major driving force of AlphaFold predictions, suggesting that these couplings contributed to the models that we analyzed.

      (6) The reliance on AlphaFold2 for structural predictions introduces potential biases and uncertainties inherent in computational models. Experimental validation of these models through additional techniques such as cryo-EM or X-ray crystallography would strengthen the conclusions.

      We agree with this point, which is why we performed extensive analysis and validation of the models for RsbU using SAXS, genetics, and biochemistry. The proposed techniques are made more challenging by flexibility and heterogeneity, which we detected in our experiments. Our attempts thus far at experimental structure determination are consistent with this being a major technical hurdle.

      (7) SAXS data provide low-resolution structural information, and the interpretation of flexibility versus rigidification might be overemphasized in its interpretation. This part of the study was difficult to interpret. Improving readability by breaking down the text into sections with clear headings for each figure panel and clarifying descriptions of the panels and methods would help. Complementary high-resolution techniques could provide a more definitive view of the linker's conformational changes.

      We have modified the presentation of the figures to clarify the SAXS analysis. The fact that the SAXS analysis suggests flexibility rather than a discrete inactive conformation means that high-resolution techniques may not be appropriate for this system.

      (8) The study primarily focuses on the model where RsbT binding rigidifies the RsbU linker. Alternative hypotheses, such as subtle conformational adjustments without complete rigidification, are not extensively explored or ruled out.

      Our analysis of the SAXS data strongly suggests that a subtle conformational change could not account for the scattering data that we obtained. We have modified the text to clarify this point.

      “Indicative of significant deviation between the RsbU structure in solution to the AlphaFold2 model, the scattering intensity profile (I(q) vs. q) was a poor fit (χ<sup>2</sup> 12.53) to a profile calculated from the AlphaFold2 model of an RsbU dimer using FoXS (Schneidman-Duhovny et al. 2016; Schneidman-Duhovny et al. 2013) (Fig. 4A). We therefore assessed the SAXS data for the RsbU dimer for features that report on flexibility (Kikhney & Svergun 2015). First, the scattering intensity data lacked distinct features caused by the multi-domain structure of RsbU from the AlphaFold2 model (Fig.4A).”

      (9) Future studies should aim to validate the AlphaFold2 predictions with high-resolution structural techniques. This would provide definitive evidence for the proposed conformational states of RsbU with and without RsbT.

      The fact that the SAXS analysis suggests flexibility rather than a discrete inactive conformation means that high-resolution techniques may not be appropriate for this system.

      (10) Investigating the RsbU-RsbT interaction in vivo using techniques like FRET, co-immunoprecipitation, or live-cell imaging would provide a more comprehensive understanding of their functional dynamics in a cellular context.

      We appreciate the reviewer’s suggestions for future experiments.

      (11) Exploring and testing alternative models of RsbU activation, such as partial rigidification or different modes of conformational change, would strengthen the conclusions.

      While our data strongly support that a flexible-to-rigid transition controls RsbU activation, we agree that it is possible that other mechanisms of linker modification could control other phosphatases and we discuss this at some length in the discussion.

      (12) The figure legends are quite dense and could benefit from some streamlining.

      We have edited the figure legends for clarity and length.

      Reviewer #2 (Recommendations for the authors):

      (1) Activation assays (Figures 1, 3, S2) are presented here as blue or white spots (reflecting a reporter activity). While off and on these are fairly clear, it is more difficult to compare the degree of activity (for instance that rsbU<sup>Q94L</sup> is more active than M166V). It would also be good to clearly present in the text the logic of asking if the mutant is RsbT independent or not (and the interpretation of that). Quantitative assays of these would be very useful.

      We chose not to perform quantitative-LacZ assays here because of several complications to interpreting these results that we encountered in our previously published study (Ho and Bradshaw, 2021). However, the level of blue pigmentation shown in Figure 1B for RsbU Q94L and RsbU M166V is qualitatively different, making the comparison possible. Most importantly, we observed cell density dependent changes in LacZ activity in the absence of rsbT for rsbU<sup>M166V</sup> expressing cells, meaning that comparisons between strains would be difficult. Additionally, we found that it was important to make a chromosomal replacement of rsbU to see the full effect of the M166V substitution. However, we were not able to construct a similar rsbU<sup>Q94L</sup> strain, likely because the high level σ<sup>B</sup> activity is lethal (we were able to construct this strain when σ<sup>B</sup> was deleted but only obtained strains with additional loss-of-function mutations in RsbU when σ<sup>B</sup> was present.

      We have modified the text to explain the logic of identifying RsbT independent variants: “We previously conducted a genetic screen (Ho & Bradshaw 2021) to identify features of RsbU that are important for phosphatase regulation by isolating gain-of-function variants that are active in the absence of RsbT.”

      (2) Explain Figure S8 graphs: as much as Alphafold is now in use, the authors should provide some further explanation of what is shown here. Blue (low error) is good, presumably. What are the A, B, C, and D sections showing? Different parts of a given letter region (and between them)? What is the x-axis? Is the top-ranked model used in every case in the text? How different are these models? The Methods section could be used for some of this (but doesn't in its current form). This also becomes important for the models generated later in the paper (Figure S7), which look rather different here.

      We have modified figure S8 to include additional labels and have added structures with the pLDDT scores shown. We have additionally modified the figure legends and methods to provide the requested information.

      (3) Figure 1C, D, Figure S2: amino acid ends of linker domains could be shown (text discusses 83-97 the linker as a two-turn coiled coil; Q94 is pretty close to the end of this coiled-coil? Figure S2 is even less clear - addresses of other amino acids would help, and or an added sequence showing the full linker and coiled-coil region). Some explanation for positions for readers to focus on for full coiled-coil would be useful in the legend of Figure S2. How strong a coiled-coil prediction is there for this region?

      We have added the sequence of the coiled-coil regions to the figures with numbering. For these analyses we used the Socket2 program, which analyzes a PDB file to identify coiled-coil regions and thus does not provide a confidence score. However, inspection of the sequence and the confidence scores of the AlphaFold2 models indicates that the coiled-coil regions are not ideal, consistent with this being a regulatory feature.

      Is it clear that the fully inactive proteins are still properly folded and soluble?

      In the case of RsbU, our biophysical analysis indicates that the inactive form of the protein is soluble. While phosphatase activity is substantially reduced, our unpublished comparison of single- and multiple-turnover reactions in the absence of RsbT indicates that nearly all of the enzyme is active.

      Finally, are there other positions that would also be expected, from this model, to stabilize the coiled-coil and thus bypass the requirement for RsbT? If so, it would be good to test these. Is it the burial of amino acid at position 94 that is important, or the ability to form crossed helices?

      Because of how short the predicted coiled-coil region is, we did not identify any obvious positions that would likely have the same effect as Q94 substitution. We considered making helix-breaking mutations, which would be predicted to block RsbU activation, but favored analysis of the wildtype protein because of limitations in interpreting the effects of loss-of-function mutations.

      (4) Figure 2A, RsbT binding to RsbU: It was not entirely clear to this reviewer why one would expect the RsbT binding, not needed for activation, to be increased by the mutation that stabilizes the crossed alpha helices. The change is impressive but doesn't the lack of a need for RsbT suggest that this mutation bypasses the normal mechanism? (Is dimerization enuf? Or other protein cross helices?).

      We have modified the text to clarify this point: “One prediction of our hypothesis that RsbT stabilizes the crossed alpha helices of the RsbU dimer, is that RsbT should bind more tightly to rsbU<sup>Q94L</sup> than to RsbU because the coiled-coil conformation that RsbT binds would be more energetically favorable.” Another way of putting this is that if the Q94L substitution activates RsbU through an on-pathway mechanism, RsbT must bind more tightly.

      (5) Figure 3A, Figure S3: Please label the yellow (interface) residues in RsbU and RsbT in Fig. S3 and the green (suppressor) spheres in Figure 3A.

      We have added labels to the figures as suggested.

      If RbsT interacts with the N-terminal dimerization domain and linker, why were residues 174 and 178 (from PPM domain) shown to be implicated in binding?

      The fact that residues in the switch region suppress a mutation that decreases RsbT binding suggests that this region is part of an allosteric network that links RsbT binding, the linker, and dimerization of the phosphatase domains. For example, any substitution that promotes a conformation of the phosphatase domain that is more favorable for dimerization would also promote RsbT binding. However, the precise details of how each mutation fits into this network is not clear and we have therefore chosen to not specify a particular model to avoid over interpreting our data.

      Are these marked in Figure S3?

      We have added labels to make this clear.

      Are these part of a dimerization interface in the C-terminal domain? Are any/all of these RsbU mutants suppressed by Q94L, as one might predict (apparently Y28I is since Q94L was again identified)?

      We chose to focus on Y28I because it was the best studied previously, but we would predict that Q94L would suppress other RsbT binding mutations.

      (6) Line 191-192: Is it surprising that no suppressors were isolated in RsbT?

      We didn’t have a preconception of whether or not it would be possible to identify similar suppressors in RsbT. Explanations for why we did not identify such suppressors could include that RsbT may be destabilized more easily by substitution, that RsbT is more constrained because it has other interaction partners, or that the particular substitutions that would suppress Y28I are less common by the PCR mutagenesis strategy we used.

      (7) Figure 3: Would the same mutants arise if the screen had been done in the absence of RsbT? Was RsbT-dependent tested for the rsbU alleles?

      Our prediction is that we would not have identified any of these mutations except for Q94L in the absence of rsbT. We tested a few of the alleles and found them all to be rsbT dependent, but did not systematically test all of the alleles and therefore did not include this analysis in the manuscript.

      Given the findings earlier in the paper for Q94L, suggesting that this stabilizes the coiled-coil and shows some activity in the absence of RsbT, it seems that the interpretation of other mutants in this region (and Q94L itself) as evidence that RsbT contacts the linker directly and that contact is necessary for activation may be an overinterpretation. If these are in fact RsbT independent, they support the importance of the linker (do they further stabilize coiled-coil formation?), rather than the role of RsbT here. Are G92 and T89 on the outside of the coiled-coil? If Q94 is buried, is it qualitatively different from these others?

      G92 and T89 are predicted to be exposed. The fact that these mutations are near Q94 is part of the reason that we focused on R91 and the predicted contact with D92 of RsbT as another approach to validate the predicted interface.

      (8) Figure 3C addresses the issue of direct interaction of RsbT with the RsbU linker to some extent, given that RsbU R91E doesn't appear to have a lot of activity without RsbT. It would be helped by telling the reader what the R91 contact is initially.

      We have modified the text to clarify this point: “To test the model that RsbT activates RsbU by directly interacting with the linker to dimerize the RsbU phosphatase domains, we introduced a charge swap at position R91 that would abolish a predicted salt-bridge with RsbT D92 (Fig. 3C).”

      (9) Figure 4 and the discussion of it in the text is not likely to be easily understandable for many readers. Aside from providing a bit more explanation of what these analyses are showing, it would be useful to start the whole section (or maybe even much earlier in the paper) with the information found on lines 261-264, that other studies show that the N-terminus dimerizes stably on its own (and is it known that the C-terminus does not?). Then the discussion of the alternative models early in this section would be clearer.

      We have updated the introduction to emphasize this point “RsbU has an N-terminal four-helix bundle domain that dimerizes RsbU and is also the binding site for RsbT, which activates RsbU as a phosphatase (Fig. 1C,D) (Delumeau et al. 2004).”

      We have also added clarification to the model presented at the beginning of this section: “A second possibility is that inactive RsbU is dimerized by the N-terminal domains but that the linkers of inactive RsbU are flexible and that the phosphatase domains only interact with each other when RsbT orders the linkers into a crossing conformation.”

      Is the dimerization of the N-terminal domains previously determined similar/the same as what is seen in the AlphaFold models used here (or the AlphaFold dimerization derived primarily from that data?).

      Yes, the dimerization in the AlphaFold models matches closely to the published structure.

      (10) Discussion and Figure 5: The final part of this work predicts AlphaFold models for a set of other phosphatases involved in initiating GSR across bacterial species, and suggests that linked-mediated phosphatase dimerization is the critical factor to activate the phosphatase. Clearly, this is the most speculative but interesting aspect of the paper. A number of possible questions are suggested by some of this:

      a. Do any of the activating mutants In RsbU and RsbP in the PPM domain (that apparently improve dimerization and thus activation) do a similar job in the other modeled proteins?

      This is an interesting question, but unfortunately most of these proteins have not been biochemically characterized. We highlight examples of RsbP and E. coli RssB for which similar activating mutations have been characterized.

      b. The legend (Figure 5G) suggests that all of the linker combinations will be coiled-coils, but that they will undergo different types of activating (and dimerizing?) transitions. Is that in fact what is being proposed here?

      Yes, this is our working hypothesis.

      c. If there is no dimerization (as noted, only weak dimerization has been reported for E. coli RssB), does that generalize the model to there are linkers and their structures are important? At the least, would the folding up of the E. coli RssB linker with antiadaptor binding be considered another mode of signal transduction or rather some sort of storage form?

      Interestingly, the P. aeruginosa RssB constitutively dimerizes, suggesting the E. coli is the outlier.

      d. Would the "toolkit" model, in which different changes occur in the linker regions, suggest that the interacting proteins are going to be critical for the type of linker changes that will be important? Or something about the nature of the linkers themselves?

      This is an interesting question that we cannot yet answer. We have chosen to focus on the possible flexibility of this mechanism and anticipate that a variety of mechanisms will be used.

      e. Given the extensive comparison to E. coli RssB, the authors might consider a figure to clarify the relative domain architecture, sequences that are akin to switch regions, and others important to the discussion here.

      We tried to highlight this in Figure 5C including coloring the regions similar to the switch regions.

      Reviewer #3 (Recommendations for the authors):

      Given the caveats noted above related to the reliability of computed structure models, I would recommend the authors make the following additions/modifications to their manuscript:

      (1) The authors should show alpha fold models coloured by pLDDT scores in an additional supplementary figure to help the reader interpret the confidence level of the predicted structures.

      We have added these models to figure 1 – figure supplement 2.

      (2) Because of the points mentioned above the authors should tone down the generalisation relating to the activation mechanism of this family of phosphatases presented in the discussion.

      We have modified the paper throughout to emphasize where we are speculating.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1:

      Summary:

      Kimura et al performed a saturation mutagenesis study of CDKN2A to assess functionality of all possible missense variants and compare them to previously identified pathogenic variants. They also compared their assay result with those from in silico predictors.

      Strengths:

      CDKN2A is an important gene that modulate cell cycle and apoptosis; therefore it is critical to accurately assess functionality of missense variants. Overall, the paper reads well and touches upon major discoveries in a logical manner.

      Weaknesses:

      The paper lacks proper details for experiments and basic data, leaving the results less convincing. Analyses are superficial and does not provide variant-level resolution. Many of which were addressed during the revision process.

      Comments on revisions:

      The manuscript was improved during the revision process.

      We thank the reviewer for their comments. We are grateful for the opportunity to provide additional information and data to clarify our approach and study results.

      Reviewer #2:

      Summary:

      This study describes a deep mutational scan across CDKN2A using suppression of cell proliferation in pancreatic adenocarcinoma cells as a readout for CDKN2A function. The results are also compared to in silico variant predictors currently utilized by the current diagnostic frameworks to gauge these predictors' performance. The authors also functionally classify CDKN2A somatic mutations in cancers across different tissues.

      Review:

      The goal of this paper was to perform functional classification of missense mutations in CDKN2A in order to generate a resource to aid in clinical interpretation of CDKN2A genetic variants identified in clinical sequencing. In our initial review, we concluded that this paper was difficult to review because there was a lack of primary data and experimental detail. The authors have significantly improved the clarity, methodological detail and data exposition in this revision, facilitating a fuller scientific review. Based on the data provided we do not think the functional characterization of CDKN2A variants is robust or complete enough to meet the stated goal of aiding clinical variant interpretation. We think the underlying assay could be used for this purpose but different experimental design choices and more replication would be required for these data to be useful. Alternatively, the authors could also focus on novel CDKN2A variants as there seems to be potential gain of function mutations that are simply lumped into "neutral" that may have important biological implications.

      Major concerns:

      Low experimental concordance. The p-value scatter plot (Figure 2 Figure Supplement 3A) across 560 variants shows low collinearity indicating poor replicability. These data should be shown in log2fold changes, but even after model fitting with the gamma GLM still show low concordance which casts strong doubt on the function scores.

      Concordance among non-significant p-values is generally low because most of the signal comes from random variability across repeats. If the observed log2 fold change between the repeats is entirely due to noise, one would expect two repeated p-values to behave like independent random uniforms. True concordance is typically more evident in significant p-values because they reflect consistent effects above random noise. Functionally deleterious variants are called when their associated p-value is significant. To confirm this statement, a scatter plot with the log2 normalized fold change was added in Figure 2 Supplement 3C. We see low concordance between repeats in the log2 normalized fold changes centered around 0, corresponding to log log2 normalized changes mainly due to noise. The concordance increases as the variants become significant. One can notice that the correlation coefficient between duplicate assay results was almost identical between the model-based p-values and log2normalized fold change (Figure 2-figure supplement 3A and 3C, Appendix 1-table 4, and Appendix 1-table 6). Also, importantly, no variant was functionally deleterious in one replicate and functionally neutral in another, implying a perfect concordance in calls if we exclude variants that were called indeterminate in one of the two repeats. Finally, of variants with discordant classifications, only 6/560 repeats (1.1%) were functionally deleterious (significant p-value) in one replicate and of indeterminate function in another. We have updated the text as follows:

      “Of variants with discordant classifications, 6 (1.1%) were functionally deleterious in one replicate and of indeterminate function in another. While 102 variants (18.2%) were functionally neutral in one replicate and of indeterminate function in another. Importantly, no variant that was functionally deleterious in one replicate and functionally neutral in another (Appendix 1 -table 4). Furthermore, the correlation coefficient between duplicate assay results was similar using the gamma GLM and log2 normalized fold change (Figure 2-figure supplement 3A and 3C).”

      The more detailed methods provided indicate that the growth suppression experiment is done in 156 pools with each pool consisting of the 20 variants corresponding to one of the 156 aa positions in CKDN2A. There are several serious problems with this design.

      Batch effects in each of the pools preventing comparison across different residues. We think this is a serious design flaw and not standard for how these deep mutational scans are done. The standard would be to combine all 156 pools in a single experiment. Given the sequencing strategy of dividing up CDKN2A into 3 segments, the 156 pools could easily have been collapsed into 3 (1 to 53, 54 to 110, 111 to 156). This would significantly minimize variation in handling between variants at each residue and would be more manageable for performance of further replicates of the screen for reproducibility purposes. The huge variation in confluency time 16-40 days for each pool suggest that this batch effect is a strong source of variation in the experiment.

      While there is variation in time to confluency between different amino acid residues, we do not anticipate this batch effect to significantly affect variant classifications in our study. For example, our results were generally consistent with previous classifications. All synonymous variants (one per residue) and benchmark benign variants assayed were classified as functionally neutral. Furthermore, of benchmark pathogenic variants assayed, none were classified as functionally neutral. 84% were classified as functionally deleterious and 16 percent were classified as indeterminate function.

      Lack of experimental/biological replication: The functional assay was only performed once on all 156 CDKN2A residues and was repeated for only 28 out of 156 residues, with only ~80% concordance in functional classification between the first and second screens. This is not sufficiently robust for variant interpretation. Why was the experiment not performed more than once for most aa sites?

      In our study we determined functional classifications for all CDKN2A missense variants while assessing variability with replicates across 28 residues. Of these variants, only 6 (1.1%) were functionally deleterious in one replicate and of indeterminate function in another. Furthermore, no variant was functionally deleterious in one replicate and functionally neutral in another (Appendix 1 -table 4).  As noted above, we provided additional context in the manuscript.

      For the screen, the methods section states that PANC-1 cells were infected at MOI=1 while the standard is an MOI of 0.3-0.5 to minimize multiple variants integrating into a single cell. At an MOI =1 under a Poisson process which captures viral integration, ~25% of cells would have more than 1 lentiviral integrant. So in 25% of the cells the effect of a variant would be confounded by one or more other variants adding noise to the assay.

      As noted previously, we are not able to differentiate effects due to multiple viral integrations per cells. However, we do not anticipate multiple viral integrations to significantly affect variant classifications in our study as our results are consistent with previous classifications, as described above.

      While the authors provide more explanation of the gamma GLM, we strongly advise that the heatmap and replicate correlations be shown with the log2 fold changes rather than the fit output of the p-values.

      Thank you for the suggestion. As noted, we provide additional explanation in the manuscript about why we classified variants using a gamma GLM. Using a gamma GLM, classification thresholds were determined using the change in representation of 20 non-functional barcodes in a pool of PANC-1 cells stably expressing CDKN2A after a period of in vitro proliferation. Our variant classifications were therefore not based on assay outputs for previously reported – benchmark – pathogenic or begin variants to determine thresholds. We strongly prefer using p-values and classifications using the gamma GLM in the manuscript. However, comparison of assay outputs using a gamma GLM and log2 fold change are included in the manuscript. Read counts, log2 fold change, and classifications based on log2 fold change are presented in the manuscript, for all variants. Readers who wish to use these data may do so and we refer them to the manuscript text, Appendix 1 -table 4, Appendix 1 -table 6, and Figure 2 -figure supplement 2.

      In this study, the authors only classify variants into the categories "neutral", "indeterminate", or "deleterious" but they do not address CDKN2A gain-of-function variants that may lead to decreased proliferation. For example, there is no discussion on variants at residue 104, whose proliferation values mostly consist of higher magnitude negative log2fold change values. These variants are defined as neutral but from the one replicate of the experiment performed, they appear to be potential gain-of-function variants.

      We have added a comment to the discussion to highlight that we did not identify potential gain-of-function variants. Specifically:

      “We classified CDKN2A missense variants using a gamma GLM, as either functionally deleterious, indeterminate functional or functionally neutral. However, we did not classify variants that may have gain-of-function effects, resulting in decreased representation in the cell pool. Future studies are necessary to determine the prevalence and significance of CDKN2A gain-of-function variants.”

      Minor concerns:

      The differentiation between variants of "neutral" and "indeterminate" function seems unnecessary and it seems like there are too many variants that fall into the "indeterminate" category. The authors seem to have set numerical thresholds for CDKN2A function using benchmark variants of known function. While the benchmark variants are important as a frame of reference for the "dynamic range" of the assay, their function scores should not necessarily be used to define hard cutoffs of whether a variant's function score can be interpreted.

      We did not utilize benchmark variants to define thresholds for functional classifications using a gamma GLM. This is one of the strengths of using a gamma GLM model for classification. As explained in our manuscript, classification thresholds were determined using the change in representation of 20 non-functional barcodes in a pool of PANC-1 cells stably expressing CDKN2A after a period of in vitro proliferation. Our variant classifications were therefore not based on assay outputs for previously reported – benchmark – pathogenic or begin variants. While not required when using a gamma GLM, we included indeterminate classifications, which are not uncommon.

      Figure 2 supplement 2 - on the x-axis, should "intermediate" be "indeterminate"?

      This, and a similar typographical error in Figure 2 -figure supplement 3, has been corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public Review):

      This study elucidates the toxic effects of the lipid aldehyde trans-2-hexadecenal (t-2-hex). The authors show convincingly that t-2-hex induces a strong transcriptional response, leads to proteotoxic stress and causes the accumulation of mitochondrial precursor proteins in the cytosol.

      The data shown are of high quality and well-controlled. The genetic screen for mutants that are hyper-and hypo-sensitive to t-2-hex is elegant and interesting, even if the mechanistic insights from the screen are rather limited. Moreover, the authors show evidence that t-2-hex affects subunits of the TOM complex. However, they do not formally demonstrate that the lipidation of a TOM subunit is responsible for the toxic effect of t-2-hex. A t-2-hex-resistant TOM mutant was not identified. Nevertheless, this is an interesting and inspiring study of high quality. The connection of proteostasis, mitochondrial biogenesis and sphingolipid metabolism is exciting and will certainly lead to many follow-up studies.

      Reviewer #3 (Public Review):

      Summary:

      The authors investigate the effect of high concentrations of the lipid aldehyde trans-2-hexadecenal (t-2-hex) in a yeast deletion strain lacking the detoxification enzyme. Transcriptomic analyses as global read out reveal that a large range of cellular functions across all compartments are affected (transcriptomic changes affect 1/3 of all genes). The authors provide additional analyses, from which they built a model that mitochondrial protein import caused by modification of Tom40 is blocked.

      Our initial transcriptomic study with high doses of t-2-hex in a detoxifying mutant as an experimental approach is only a starting experiment and was aimed to identify as many determinants of t-2-hex toxicity as possible as stated in the manuscript. From this, we developed multiple independent approaches in wild-type (and mutant) cells at low t-2-hex concentrations, demonstrating that proteostasis and mitochondrial protein trafficking are physiologically important targets of the pro-apoptotic lipid. Specifically, proteostasis-specific PACE reporters are robustly induced in a detoxification mutant by 5mM t-2-hex (Figure 3D,E) and significantly induced by 10 mM t-2-hex in detoxification competent wild type cells (new Figure 3F).

      We do not propose Tom40 as the lipid's primary target, while we show that several subunits of the TOM (and TIM) complex are directly targeted by low t-2-hex concentrations in vitro (Figure 8B), and Tom20 and Tom70 are important for lipid toxicity (Figure 8D) and mitochondrial protein trafficking in vivo (Suppl. Figure 2).

      Strengths:

      Global analyses (transcriptomic and functional genomics approach) to obtain an overview of changes upon yeast treatment with high doses of t-2-hex.

      Weaknesses:

      The use of high concentrations of t-2-hex in combination with a deletion of the detoxifying enzyme Hfd1 limits the possibility to identify physiological relevant changes. From the hundreds of identified targets the authors focus on mitochondrial proteins, which are not clearly comprehensible from the data.

      The initial transcriptomic study with high doses of t-2-hex in a detoxifying mutant is a starting experiment and was aimed to identify as many determinants of t-2-hex toxicity as possible as stated in the manuscript. As stated (page 4), genes up-regulated (>2 log2FC) by t-2-hex were selected and subjected to GO category enrichment analysis (Supplemental Table 1). We found that “Mitochondrial organization” was the most numerous GO group activated by t-2-hex.  Among the strongly t-2-hex induced genes encoding mitochondrial proteins, CIS1 represented the most inducible gene with a known mitochondrial function. Cis1 is the central protein of the MitoCPR pathway, which is specifically induced upon and protects from mitochondrial protein import stress. We further show that proteostasis and mitochondrial protein trafficking are physiologically important targets at low t-2-hex doses in several independent experimental approaches: proteostasis-specific PACE reporters are robustly induced in a detoxification mutant by 5mM t-2-hex (Figure 3D,E) and significantly induced by 10mM t-2-hex in detoxification competent wild type cells (new Figure 3F); mitochondrial pre-protein accumulation is induced by 10mM t-2-hex in wild type cells (Figure 5G); several subunits of the TOM and TIM complexes are lipidated by low (10mM) t-2-hex doses in wild type cell extracts (Figure 8B), mitochondrial import assays with mt-GFP in intact yeast wild type cells reveal that t-2-hex significantly inhibits import at low (5mM) t-2-hex concentrations (new Suppl. Figure 1). 5-10mM t-2-hex applied here is considerably lower than the published data in human cells with ³ 25mM on intact cells or cell extracts (Jarugumilli et al. 2018).

      The main claim of the manuscript that t-2-hex targets the TOM complex and inhibits mitochondrial protein import is not supported by experimental data as import was not experimentally investigated. The observed accumulation of precursor proteins could have many other reasons (e.g. dissipation of membrane potential, defects in mitochondrial presequence proteases, defects in cytosolic chaperones, modification of mitochondrial precursors by t-2-hex rendering them aggregation prone and thus non-import competent). However, none of these alternative explanations have been experimentally addressed or discussed in the manuscript.

      We have now performed additional experiments, alternative to the pre-protein quantifications, showing that t-2-hex specifically inhibits mitochondrial protein import. We investigated the effect of t-2-hex on mitochondrial protein import using flow cytometric GFP assays in live yeast cells. Specifically, we compared the expression and maturation of GFP targeted either to the cytosol or the mitochondrial matrix and show that low doses of t-2-hex (≥5 μM) significantly inhibited mt-GFP activity compared to cytosolic GFP in wild-type cells (new Supplemental Figure 1B). In contrast, this inhibition was not observed with the saturated derivative, t-2-hex-H2. Flow cytometric rhodamine123 assays revealed that t-2-hex did not alter ΔΨm within the concentration range that efficiently inhibits mt-GFP activity (new Supplemental Figure 1C). Alternative t-2-hex effects such as the direct modification of mitochondrial pre-proteins or cytosolic chaperones, potentially making the precursors prone to aggregation, are less likely, as the mitochondrial and cytosolic GFP used in these import studies differ only by the small, cysteine-free PreSu9 pre-peptide. This information is now included in the Results and Discussion sections.

      Furthermore, many of the results have been reported before (interaction of Tom22 and Tom70 with Hfd1) or observed before (TOM40 as target of t-2-hex in human cells).

      The interaction of Tom22 or Tom70 with Hfd1 has been only reported in high throughput pull-down studies in yeast (Opalinski et al., 2018 and Burri et al., 2006), and no functional connection between Hfd1 lipid detoxification and TOM has been investigated. Here we corroborate these high throughput results by targeted pull-down experiments, which strengthens the new finding that Hfd1 functionally interacts with the TOM complex. Tom40 has been found to be lipidated by high t-2-hex concentrations in human cell extracts in high throughput in vitro proteomic studies (Jarugumilli et al., 2018), but no functional connection between human TOM and t-2-hex has been investigated so far. Here we corroborate these high throughput results by targeted experiments, which strengthens the new findings that t-2-hex and TOM interact functionally.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Congratulations on this exciting study. Even if some of the mechanistic details will have to be addressed in further studies (which of the modified sites are physiologically relevant; which sites are modified in vivo without external addition of t-2-hex) this study is inspiring and opens a new direction of mitochondrial research. I therefore fully support publication of this nice study in its current form.

      Reviewer #3 (Recommendations For The Authors):

      Two of the reviewers pointed out that the observation of precursors in whole cell extract is not sufficient to draw conclusions on mitochondrial protein import rates. The authors did not provide any new experiments but argued that a recent publication (Weidberg and Amon, 2018) had used the same readout for this conclusion. Why this manuscript was accepted with this statement is not known to this reviewer, but it does not change the fact, that the conclusion is not valid. Many alternative explanations are possible (see public review) and the claim that the import competence of the TOM complex is affected upon t-2-hex treatment is not appropriate.

      We have now performed new experiments addressing the inhibition of mitochondrial protein import by t-2-hex as an alternative to our precursor accumulation assays. We compared the induced expression of cytosolic and mitochondrial GFP by flow cytometry as a quantitative mitochondrial import assay (Sirk et al., Cytometry A. 2003 Nov; 56(1) 15-22). Low doses of t-2-hex (≥5 μM) significantly inhibited mt-GFP activity as compared to cytosolic GFP in wild-type cells (new Supplemental Figure 1B). This inhibition of mitochondrial GFP is independent of mitochondrial membrane potential perturbation (new Supplemental Figure 1C) and alternative t-2-hex effects, such as the direct modification of the mtGFP precursor or cytosolic chaperones are less likely, as the mitochondrial and cytosolic GFP used in these import studies differ only by the small, cysteine-free PreSu9 pre-peptide.

      The first sentence of the abstract states that t-2-hex „induces mitochondrial dysfunction in a conserved manner from yeast to human". I find two issues with this statement: 1) if the mechanism is known what is the question addressed in the present manuscript and 2) the second sentence of the results fully contradicts the above sentence „In human cells, t-2-hex causes mitochondrial dysfunction by directly stimulating Bax-oligomerisation at the outer mitochondrial membrane. In yeast, however, t-2-hex efficiently interferes with mitochondrial function and cell growth in a Bax independent manner."

      We agree that the first sentence was misleading, this has been fixed now in the revised version.

      The first reviewer requested a repetition of key experiments with lower concentrations and the authors provided additional in vitro data, however, for this, 10 uM is still very high. To gain valuable and physiological relevant data the initial transcriptomic analysis should be repeated with a low amount and in a wild-type yeast background.

      Published t-2-hex chemoproteomic experiments on human cell extracts were performed at higher concentrations (>25mM) and human Bax is hardly lipidated by 10mM t-2-hex (Jarugumilli et al., 2018), therefore the in vitro lipidation data provided in our study should be considered a low t-2-hex dose. The initial transcriptomic study with high doses of t-2-hex in a detoxifying mutant is a starting experiment and was aimed at identifying as many determinants of t-2-hex toxicity as possible. Building on this, we further show that proteostasis and mitochondrial protein trafficking, the relevant cellular functions for our study, are physiologically important targets at low t-2-hex doses in several independent experimental approaches: proteostasis-specific gene expression is robustly induced in a detoxification mutant by 5mM t-2-hex (Figure 3D,E) and significantly induced by 10mM t-2-hex in detoxification competent wild type cells (new Figure 3F); mitochondrial pre-protein accumulation is induced by 10mM t-2-hex in wild type cells (Figure 5G); several subunits of the TOM and TIM complexes are lipidated by low (10mM) t-2-hex doses in vitro in wild type extracts (Figure 8B), mitochondrial import assays with mt-GFP in intact yeast wild type cells reveal that t-2-hex significantly inhibits import at low (5mM) t-2-hex concentrations (new Suppl. Figure 1).

      As already stated above there are many alternative explanations for the observed accumulation of precursor proteins, e.g. the decreased proteasome activity could be cause and not consequence. Also, the modification of precursors directly upon translation in the cytosol could likely impact on their further transport and result in direct aggregation in the cytosol.

      As mentioned above, we have now corroborated the t-2-hex specific mitochondrial protein import defect by alternative in vivo experiments, which are not dependent on the accumulation of mitochondrial precursors. We have tested now the possibility that decreased proteasome activity could indirectly inhibit mitochondrial import. This is not the case because a rpn4 mutant with impaired proteasomal activity induces normal mtGFP levels (new Suppl. Figure 1D). We cannot exclude that the modification of precursors by t-2-hex upon translation might additionally impact on the transport of some mitochondrial pre-proteins. However, mitochondrial and cytosolic GFP used in the import studies only differ in the small cysteine-free PreSu9 pre-peptide making it very unlikely that precursor lipidation is secondarily responsible for the observed import defect.

      Many of the comments after first reviewing the manuscript were not addressed experimentally although many of the suggested experiments are easy to perform. I can only encourage the authors to provide more experimental support and controls, as the claims are currently not sufficiently supported.

      In the two revisions of our manuscript, we have included several control experiments to better link the pro-apoptotic lipid t-2-hex with mitochondrial import stress. These include: in vitro lipidation of TOM/TIM subunits by low t-2-hex concentrations, t-2-hex tolerance and recovery of mitochondrial protein import in specific tom mutants, inhibition of mitochondrial protein import (pre-protein and mtGFP assays) by low t-2-hex doses independently on mitochondrial membrane potential and proteasome activity, and induction of proteostasis specific gene expression by low t-2-hex doses.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have developed self-amplifying RNAs (saRNAs) encoding additional genes to suppress dsRNA-related inflammatory responses and cytokine release. Their results demonstrate that saRNA constructs encoding anti-inflammatory genes effectively reduce cytotoxicity and cytokine production, enhancing the potential of saRNAs. This work is significant for advancing saRNA therapeutics by mitigating unintended immune activation.

      Strengths:

      This study successfully demonstrates the concept of enhancing saRNA applications by encoding immune-suppressive genes. A key challenge for saRNA-based therapeutics, particularly for non-vaccine applications, is the innate immune response triggered by dsRNA recognition. By leveraging viral protein properties to suppress immunity, the authors provide a novel strategy to overcome this limitation. The study presents a well-designed approach with potential implications for improving saRNA stability and minimizing inflammatory side effects.

      We thank Reviewer #1 for their thorough review and for recognizing both the significance of our work and the potential of our strategy to expand saRNA applications beyond vaccines.

      Weaknesses:

      (1) Impact on Cellular Translation:

      The authors demonstrate that modified saRNAs with additional components enhance transgene expression by inhibiting dsRNA-sensing pathways. However, it is unclear whether these modifications influence global cellular translation beyond the expression of GFP and mScarlet-3 (which are encoded by the saRNA itself). Conducting a polysome profiling analysis or a puromycin labeling assay would clarify whether the modified saRNAs alter overall translation efficiency. This additional data would strengthen the conclusions regarding the specificity of dsRNA-sensing inhibition.

      We thank the reviewer for this helpful insight and suggestion. We aim to conduct a puromycin labelling assay to clarify the effect of the various saRNA constructs on translation efficiency.

      (2) Stability and Replication Efficiency of Long saRNA Constructs:

      The saRNA constructs used in this study exceed 16 kb, making them more fragile and challenging to handle. Assessing their mRNA integrity and quality would be crucial to ensure their robustness.

      Furthermore, the replicative capacity of the designed saRNAs should be confirmed. Since Figure 4 shows lower inflammatory cytokine production when encoding srIkBα and srIkBα-Smad7-SOCS1, it is important to determine whether this effect is due to reduced immune activation or impaired replication. Providing data on replication efficiency and expression levels of the encoded anti-inflammatory proteins would help rule out the possibility that reduced cytokine production is a consequence of lower replication.

      This is another very helpful comment. We will conduct an analysis of saRNA integrity and quality by denaturing gel electrophoresis. To examine replicative capacity of the saRNA constructs, we aim to conduct RT-qPCR experiments.

      (3) Comparative Data with Native saRNA:

      Including native saRNA controls in Figures 5-7 would allow for a clearer assessment of the impact of additional genes on cytokine production. This comparison would help distinguish the effect of the encoded suppressor proteins from other potential factors.

      Thank you for your suggestion. We will implement this change in the next version of the manuscript.

      (4) In vivo Validation and Safety Considerations:

      Have the authors considered evaluating the in vivo potential of these saRNA constructs? Conducting animal studies would provide stronger evidence for their therapeutic applicability. If in vivo experiments have not been performed, discussing potential challenges - such as saRNA persistence, biodistribution, and possible secondary effects-would be valuable.

      (5) Immune Response to Viral Proteins:

      Since the inhibitors of dsRNA-sensing proteins (E3, NSs, and L*) are viral proteins, they would be expected to induce an immune response. Analyzing these effects in vivo would add insight into the applicability of this approach.

      We recognize the importance of in vivo studies and immune cell responses and plan to incorporate in vivo imaging in future studies to investigate these interactions, as well as examining delivery of various cargoes via saRNA to determine potential therapeutic benefits in different animal models of inflammatory pain, but such studies are beyond the scope of this current investigation. As suggested by the reviewer, we will incorporate a section on potential challenges of in vivo saRNA work in the revised manuscript.

      (6) Streamlining the Discussion Section:

      The discussion is quite lengthy. To improve readability, some content - such as the rationale for gene selection-could be moved to the Results section. Additionally, the descriptions of Figure 3 should be consolidated into a single section under a broader heading for improved coherence.

      Thank you for your suggestions, we will make these changes in the next revision.

      Reviewer #2 (Public review):

      Summary:

      Lim et al. have developed a self-amplifying RNA (saRNA) design that incorporates immunomodulatory viral proteins, and show that the novel design results in enhanced protein expression in vitro in mouse primary fibroblast-like synoviocytes. They test constructs including saRNA with the vaccinia virus E3 protein and another with E3, Toscana virus NS protein and Theiler's virus L protein (E3 + NS + L), and another with srIκBα-Smad7-SOCS1. They have also tested whether ML336, an antiviral, enables control of transgene expression.

      Strengths:

      The experiments are generally well-designed and offer mechanistic insight into the RNA-sensing pathways that confer enhanced saRNA expression. The experiments are carried out over a long timescale, which shows the enhance effect of the saRNA E3 design compared to the control. Furthermore, the inhibitors are shown to maintain the cell number, and reduce basal activation factor-⍺ levels.

      We thank Reviewer #2 for their detailed assessment and recognition of the mechanistic insights provided by our study.

      Weaknesses:

      One limitation of this manuscript is that the RNA is not well characterized; some of the constructs are quite long and the RNA integrity has not been analyzed. Furthermore, for constructs with multiple proteins, it's imperative to confirm the expression of each protein to confirm that any therapeutic effect is from the effector protein (e.g. E3, NS, L). The ML336 was only tested at one concentration; it is standard in the field to do a dose-response curve. These experiments were all done in vitro in mouse cells, thus limiting the conclusion we can make about mechanisms in a human system.

      We agree that these are weaknesses of our work. We plan to address some of these weaknesses by performing a dose response curve for ML336, examining saRNA integrity through denaturing gel electrophoresis, and will also aim to provide additional evidence for effects of effector proteins through RT-qPCR. We are also looking into testing these constructs in patient-derived FLS.

    1. Author response:

      Thanks for the positive review of our manuscript and for appreciating our work.

      We align in many ways with the reviewers comments.. Our initial finding concerning the slight shift of f_free in a/b neurons after conditioning is interesting but we agree it would certainly deserve a follow-up to substantiate its link with memory formation. We also agree that an analysis in distribution rather than through an averaged signal might be more sensitive.

      We however have to cope with the fact that extending our investigation would require manpower resources that are no longer available. Therefore we appreciate the suggestion made by the 3 reviewers to restrain the claim and hence change the title to "In vivo NAD(P)H autofluorescence lifetime imaging reveals metabolic heterogeneity within the Drosophila mushroom body.". We find it matches better with the scope of this study which is mostly to showcase the potential of NAD(P)H FLIM to quantify variations in metabolism in Drosophila brain rather than firmly testing a specific hypothesis linked to memory formation. In this respect, we do provide quantitative results showing metabolic profile variations between brain tissues such as the somata and calyx regions but also between different Kenyon cells subtypes. We would then present the shifts of f_free induced by conditioning as a curio that might entice future work, as advised by Reviewer #2.

      Altogether, in the revised version we will change the title to restrain the claim, move two supplementary figures as main figures to better focus on and describe the registration process. We will also correct the figure panels pointed by the reviewers and add individual samples to our boxplots. We will also slightly compress the introduction and expand the discussion on potential applications. Finally, we will evaluate if statistical tests based on distributions may be more sensitive to observe a significant shift in FLIM signal in the a/b KCs after conditioning, to strengthen our last observation if confirmed.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review): Summary:

      The authors demonstrate that two human preproprotein human mutations in the BMP4 gene cause a defect in proprotein cleavage and BMP4 mature ligand formation, leading to hypomorphic phenotypes in mouse knock-in alleles and in Xenopus embryo assays.

      Strengths:

      They provide compelling biochemical and in vivo analyses supporting their conclusions, showing the reduced processing of the proprotein and concomitant reduced mature BMP4 ligand protein from impressively mouse embryonic lysates. They perform excellent analysis of the embryo and post-natal phenotypes demonstrating the hypomorphic nature of these alleles. Interesting phenotypic differences between the S91C and E93G mutants are shown with excellent hypotheses for the differences. Their results support that BMP4 heterodimers act predominantly throughout embryogenesis whereas BMP4 homodimers play essential roles at later developmental stages.

      Weaknesses:

      (1) A control of BMP7 alone in the Xenopus assays seems important to excludeBMP7 homodimer activity in these assays.

      We and other have shown that BMP7 homodimers have weak or no activity while BMP4/7 heterodimers single at a much higher level than either BMP4 or BMP7 homodimers in Xenopus ectodermal and mesodermal cells. We have expanded the description of these published findings in the results section (lines 182-187). We have also added representative examples of experiments in which BMP4 and BMP7 alone controls are included (new Fig. S2). Since the level of activity of BMP7 + BMP4 variants is equivalent to that of BMP7 + WT BMP4, this cannot be accounted for by BMP7 homodimers.

      (2) The Discussion could be strengthened by more in-depth explanations of how BMP4 homodimer versus heterodimer signaling is supported by the results, so that readers do not have to think it all through themselves. Similarly, a discussion of why the S91C mutant has a stronger phenotype than E93G early in the Discussion would be helpful or least mention that it will be addressed later.

      We have revised the discussion as suggested by the reviewer. Please see responses to recommendations 2-4 below.

      Reviewer #1 (Recommendations for the authors):

      (1) A control of BMP7 injection alone seems missing when comparing the BMP4/7 variants. BMP4 in the embryo assays presented in Fig 1. Is it not possible that the activity observed is BMP7 homodimers, perhaps due to inhibited heterodimer formation by the BMP4 variant?

      Multiple published studies have shown that BMP7 homodimers have weak or no activity in Xenopus ectodermal and mesodermal cells, and that ½ dose of RNA encoding BMP4 and BMP7 together signals at a higher level than does a full dose of RNA encoding either BMP4 or BMP7 alone. We have expanded our description of these published findings (lines 182-187), have included additional details about RNA doses that were injected (line 156, 175, 182) and have added representative examples of experiments in which BMP4 and BMP7 controls were included in a new Figure (Fig. S2).

      (2) In reading the Discussion, I was continually thinking of the stronger phenotype of the S91C mutant compared to the E93G one, although both are discussed together throughout most of the Discussion. Only at the end of the Discussion is the stronger phenotype of S91C discussed with a compelling explanation for the stronger phenotype, not related to the phosphorylation site function. I wonder if it would be better placed earlier in Discussion or at least mentioned the difference in phenotypes that will be discussed later.

      We have moved the possible explanation of differences between Bmp4<sup>S91C</sup> and Bmp4<sup>E93G</sup> mutants to immediately follow the introductory paragraph of the results section.

      (3) Along these same lines, why is it that the E93G exhibits rather normal cleavage at E10.5? Might the mechanisms of cleavage vary in different contexts with phosphorylation-dependent cleavage not functioning at early stages of development? I believe the hypothesis is that it is cleaved due to heterodimerization with BMP7. More discussion of this excellent hypothesis should be provided with clear statements, rather than inferences, if I'm understanding this correctly. For example, I had to read 3 times the first sentence of the last paragraph on p.14 before I understood it. Better to break that sentence down and the one that follows it, so it is easier to understand.

      We have rewritten and expanded the paragraphs describing phenotypic and biochemical evidence for defective homodimer but not heterodimer signaling as suggested (lines 343-375). We have also more explicitly stated the possibility that normal cleavage of BMP4<sup>E93G</sup> in embryonic lystates may be due to a predominance of BMP4/7 heterodimers in early embryonic stages or spatiotemporal differences in phosphorylation-dependent cleavage of BMP4 homodimers (lines 369-372)

      (4) Similarly the last paragraph of the Discussion mentions that the authors provide evidence of BMP4 homodimer signaling. I agree with the authors, but I had to think through the evidence myself. Better if the authors clearly explain the evidence that points to this, as this is a very good point of

      See response to point 3, above. Thank you for these useful suggestions.

      (5) Last sentence, first paragraph on p.11 should be qualified for the E93G mutant to E13.5, since it was normal at E10.5 regarding Figure 4 results.

      Thank you for pointing this out. It has been corrected.

      (6) Skip the PC acronym, since it is only repeated once in the text and hard to remember almost 10 pages later when it is used again.

      We have corrected this.

      (7) In the Discussion, a typo in "a single intramolecular disulfide bond that stabilizes the dimer", should be 'intermolecular'.

      Thank you for catching our switch in the use of inter- and intramolecular. We have corrected this (lines 334-335).

      (8) At times the E93G mutant is referred to having early lethality, often in conjunction with S91C, while other times it is referred to as late lethality. Considering that the homozygotes die postnatally after weaning, most would consider it late lethality. In contrast S91C is indeed an early lethal.

      We have changed the wording in the introduction to state that “mice carrying Bmp4<sup>S91C</sup> or Bmp4<sup>E93G</sup> knock in mutations show embryonic or enhanced postnatal lethality, respectively,… (lines 141-143)” and have removed the word “early” from the title.

      Reviewer #2 (Public review): Summary:

      Kim et al. report that two disease mutations in proBMP4, Ser91Cys and Glu93Gly, which disrupt the Ser91 FAM20C phosphorylation site, block the activation of proBMP4 homodimers. Consequently, analysis of DMZ explants from Xenopus embryos expressing the proBMP4 S91C or E93G mutants showed reduced pSmad1 and tbxt1 expression. The block in BMP4 activity caused by the mutations could be overcome by co-expression of BMP7, suggesting that the missense mutations selectively affect the activity of BMP4 homodimers but not BMP4/7 heterodimers. The expert amphibian tissue transplant studies were extended to in vivo studies in Bmp4S91C/+ and Bmp4E93G/+ mice, demonstrating the impact of these mutations on embryonic development, particularly in female mice, in line with patient studies. Finally, studies in MEFs revealed that the mutations did not affect proBMP4 glycosylation or ER-to-Golgi transport but appeared to inhibit the furin-dependent cleavage of proBMP4 to BMP4. Based on these findings and AI (AlphaFold) modeling of proBMP4, the authors speculate that pSer91 influences access of furin to its cleavage site at Arg289AlaLysArg292.

      Strengths:

      The Xenopus and mouse studies are valuable and elegantly describe the impact of the S91C and E93G disease mutations on BMP signaling and embryonic development.

      Weaknesses:

      The interpretation of how the mutations may disturb the furin-mediated cleavage of proBMP4 is underdeveloped and does not consider all of their data. Understanding how pS91 influences the furin-dependent cleavage at Arg292 seems to be the crux of this work and thus warrants more consideration. Specifically:

      (1) Figure S1 may be significantly more informative than implied. The authors report that BMP4S91D activates pSmad1 only incrementally better than S91C and much less than WT BMP4. However, Fig. S1B does not support the conclusion on page 7 (numbering beginning with title page); "these findings suggest that phosphorylation of S91 is required to generate fully active BMP4 homodimers". The authors rightly note that the S91C change likely has manifold effects beyond inhibiting furin cleavage. The E93G change may also affect proBMP4 beyond disturbing FAM20C phosphorylation. Additional mutation analyses would strengthen the work.

      The major goal of generating and comparing the activity of the S91D mutant with S91C was to control for phosphorylation independent defects cause by the deleterious introduction of a cysteine residue, which might cause aberrant disulfide bonding. We opted to introduce S91D since “phosphomimics” can sometimes approximate the phosphorylated state. S91D has significantly higher activity than S91C (p<0.01) and has a less significant loss of activity (p<0.05) than does S91C (<p<0.0001) relative to wild type BMP4 (Fig. S1), consistent with deleterious effects of the cysteine residue and supporting a possible explanation for the more severe phenotype of S91C vs E93G mice. We have rewritten this section to clarify our interpretation (lines 165-174)and have changed our statement that our activity data “suggest the importance of phosphorylation” to a statement that they are consistent with this possibility (lines 179-180). We do not believe that further mutational analysis using activity assays in Xenopus would shed light on how or whether phosphorylation affects proteolytic activation of BMP4.

      (2) These findings in Figure S1 are potentially significant because they may inform how proBMP4 is protected from cleavage during transit through the TGN and entry into peripheral cellular compartments. Intriguing modeling studies in Figure 6 suggest that pSer91 is proximal to the furin cleavage site. Based on their presentation, pSer91 may contact Arg289, the critical P4 residue at the furin site. If so, might that suggest how pS91 may prevent furin cleavage, thus explaining why the S91D mutation inhibits processing as presented, and possibly how proBMP4 processing is delayed until transit to distal compartments (perhaps activated by a change in the endosomal microenvironment or a Ser91 phosphatase)? Have the authors considered or ruled out these possibilities? In addition to additional mutation analyses of the FAM20C site, moving the discussion of this model to an "Ideas and Speculation" subsection may be warranted.

      The model shown in Fig. 6B proposes the possibility that phosphorylation unmasks (rather than preventing) the furin cleavage motif due to the proximity of Ser91 to the cleavage site (lines 399-402). If S91D truly mimicked phosphorylation, we would predict it would facilitate processing rather than inhibiting it. We do not have data comparing cleavage of S91D relative to wild type BMP4 and have not generated knock in S91D mice to test this idea. While the reviewers questions are intriguing, they cannot be answered by mutational analysis of the FAM20C site and are beyond the scope of the current studies that sought to understand the impact of human pS91C and pE93G mutations and cell biological implications. We have moved the models to an “Ideas and Speculation” subsection as suggested (lines 377-414) since these models are meant to provoke further thought rather than provide definitive answers based on our data.

      (3) The lack of an in vitro protease assay to test the effect of the S91 mutations on furin cleavage is problematic.

      Although we routinely perform in vitro cleavage assays with recombinant furin, we don’t believe they would be informative on how S91 phosphorylation or mutation of this residue impacts cleavage since in vitro synthesized substrate used in these assays is neither dimerized not post-translationally modified, and cleavage would be tested in isolation from the endogenous trafficking environment that we propose influences cleavage.

      Reviewer #2 (Recommendations for the authors):

      (1) The impact of BMPS91A should be determined and paired with the S91D phosphomimic data to reveal if it causes proBMP4 to be cleaved prematurely and disturbs pSmad1 expression. Data for S93G should also be included.

      Our major goal in comparing the activity of S91D with S91C was to control for phosphorylation independent defects cause by the deleterious introduction of a cysteine residue in S91C, which might cause aberrant disulfide bonding. We opted to introduce S91D since “phosphomimics” can sometimes approximate the phosphorylated state. We note that S91D has significantly higher activity than S91C, consistent with deleterious effects of the cysteine residue and supporting a possible explanation for the more severe phenotype of S91C vs E93G mice. We have revised the wording of this section to clarify this. Our models predict that S91D would be cleaved more efficiently than S91C or S91A, if it really mimics the endogenous phosphorylated state, rather than being cleaved prematurely. Our biochemical analysis compares cleavage of endogenous BMP4 in wild type and mutant MEFs. Generation of S91D, S91A or S93G mutant mice to compare cleavage is beyond the scope of the current work.

      (2) Is the distance between pS91 and Arg289 close enough to form a hydrogen bond? If so, might this interaction influence furin access?

      AI modeling does not provide high probability prediction of structures surrounding the furin motif (see Fig. S7) and thus we cannot comment on whether or not these residues are close enough to form a hydrogen bond. We have revised the wording of the discussion to state “This simple model building indicates the possibility of direct contact between pSer91 and Arg289, and that phosphorylation is required for furin to access the cleavage site, although we note that predictions surrounding the furin motif represent low probability conformations (Fig. S7) (lines 399-402).”

      (3) The genotypes in Figure 2 are labeled awkwardly. Consider labeling the headers for the three subsections of panels (A-F, G-L, and M-O) differently.

      We have revised Fig. 2 to clarify that the three subsections of panels are distinct, and to emphasize that the middle subsection represents views of the right and left side of the same embryo.

      (4) The tables should be reformatted. As is, the labeling is frequently cut off, and the numbers of expected and observed progeny should both be stated to aid the reader.

      We thank the reviewer for noting the formatting errors in the tables, which we have corrected. We have also changed the tables so that normal or abnormal mendelian distributions are reported as numbers of observed/expected progeny rather than numbers/percent observed progeny.

      Reviewer #3 (Public review):

      Summary:

      The authors describe important new biochemical elements in the synthesis of a class of critical developmental signaling molecules, BMP4. They also present a highly detailed description of developmental anomalies in mice bearing known human mutations at these specific elements.

      Strengths:

      Exceptionally detailed descriptions of pathologies occurring in mutant mice. Novel findings regarding the interaction of propeptide phosphorylation and convertase cleavage, both of which will move the field forward. Provocative hypothesis regarding furin access to cleavage sites, supported by Alphafold predictions.

      Weaknesses:

      Figure 6A presents two testable models for pre-release access of furin to cleavage sites since physical separation of enzyme from substrate only occurs in one model; could immunocytochemistry resolve?

      Available reagents are not sensitive enough to detect endogenous furin and BMP4 with high resolution. Because PC/substrate interactions are transient, whereas the bulk of furin and BMP4 is distributed throughout the secretory pathway, it is not possible to co-immunolocalize furin and BMP4 in vivo at present. Studies using more advanced cell biological techniques such along with tagged proteins may enable us to test these hypotheses in the future.

      Reviewer #3 (Recommendations for the authors):

      This interesting paper presents new data on an important family of developmental signaling molecules, BMPs. Mutations at FAM20C consensus sites within BMP prodomains are known to cause birth defects. The authors have here explored differential effects of human mutations on hetero- and homodimer activity and maturation, issues that may well arise during human development. In addition to demonstrating the profound effect of these mutations on development in Xenopus and mice, the authors also show differential processing of BMP4 precursors bearing these mutations in MEF cells prepared from mutant embryos. Finally, they show that FAM20C plays a role in BMP4 prodomain processing with quite differing outcomes in homo- vs heterodimers, which they suggest is due to structural differences impacting furin access. While this latter idea remains speculative due to the lack of crystal structures (models are based on Alphafold) it is a highly promising line of work.

      The data are beautifully presented and will be of clear interest to all developmental biologists. Certain cell biology results may also extrapolate to other phosphorylated precursor molecules undergoing the interesting (and as yet unexplained) phenomenon of convertase cleavage immediately before secretion, for example, FGF23. I have only a few minor comments regarding the presentation, which is remarkably clear.

      (1) The introduction of BMP7 in the Abstract is abrupt. It should be described as a preferred dimerization partner for BMP4.

      Thank you for noting this. We have revised the first sentence of the abstract to better introduce BMP7(lines 49-50).

      (2) In Figure 1A, what is the small light green box?

      This is a small fragment released from the prodomain by the second cleavage. We have clarified this in the introduction (lines 112-114) and in the legend to Figure 1 (lines 758-759).

      (3) In the Discussion it might be relevant to mention that FAM20C propeptide is not cleaved by convertases but by S1P (Chen 2021).

      We have added this information to clarify (lines 394-396).

      (4) Figure 3, define VSD; Figure 5, Endo H removes sugars only from immature (nonsialylated) sugars, not from all chains as implied. More importantly, EndoH and PNGase remove N-linked sugars, yet Results refer only to O-linked glycosylation.

      Thank you for noting these oversights. We have defined VSD in Figure 3. We have also revised the headers for Fig. 5 and for the relevant subsection of the results to include N-linked glycosylation and note in the results that EndoH removes only immature N-linked carbohydrates (lines 301-304).

      (5) Figure 5- for clarity, I suggest it be broken up into two larger panels labeled "Embryos" and "MEFs"

      Thank you for this suggestion, we have subdivided the Figure into two panels.

      (6) Figure 6A presents two testable models for pre-release access of furin to cleavage sites since the physical separation of the enzyme from substrate only occurs in one model; could confocal immunocytochemistry resolve?

      Available reagents are not sensitive enough to detect endogenous furin and BMP4 with high resolution and PC/substrate interactions are transient whereas the bulk of both furin and BMP4 is in transit through the secretory pathway. For these reasons it is not possible to co-immunolocalize furin and BMP4 in vivo. Future studies using advanced cell biological techniques may enable us to test these hypotheses in the future.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses the eye lens as a model to investigate basic mechanisms in the Fgf signaling pathway. Understanding Fgf signaling is of broad importance to biologists as it is involved in the regulation of various developmental processes in different tissues/organs and is often misregulated in disease states. The Fgf pathway has been studied in embryonic lens development, namely with regards to its involvement in controlling events such as tissue invagination, vesicle formation, epithelium proliferation and cellular differentiation, thus making the lens a good system to uncover the mechanistic basis of how the modulation of this pathway drives specific outcomes. Previous work has suggested that proteins, other than the ones currently known (e.g., the adaptor protein Frs2), are likely involved in Fgfr signaling. The present study focuses on the role of Shp2 and Shc1 proteins in the recruitment of Grb2 in the events downstream of Fgfr activation.

      Strengths:

      The findings reveal that the juxtamembrane region of the Fgf receptor is necessary for proper control of downstream events such as facilitating key changes in transcription and cytoskeleton during tissue morphogenesis. The authors conditionally deleted all four Fgfrs in the mouse lens that resulted in molecular and morphological lens defects, most importantly, preventing the upregulation of the lens induction markers Sox2 and Foxe3 and the apical localization of F-actin, thus demonstrating the importance of Fgfrs in early lens development, i.e. during lens induction. They also examined the impact of deleting Fgfr1 and 2, on the following stage, i.e. lens vesicle development, which could be rescued by expressing constitutively active KrasG12D. By using specific mutations (e.g. Fgfr1ΔFrs lacking the Frs2 binding domain and Fgfr2LR harboring mutations that prevent binding of Frs2), it is demonstrated that the Frs2 binding site on Fgfr is necessary for specific events such as morphogenesis of lens vesicle. Further, by studying Shp2 mutations and deletions, the authors present a case for Shp2 protein to function in a context-specific manner in the role of an adaptor protein and a phosphatase enzyme. Finally, the key surprising finding from this study is that downstream of Fgfr signaling, Shc1 is an important alternative pathway - in addition to Shp2 - involved in the recruitment of Grb2 and in the subsequent activation of Ras. The methodologies, namely, mouse genetics and state-of-the-art cell/molecular/biochemical assays are appropriately used to collect the data, which are soundly interpreted to reach these important conclusions. Overall, these findings reveal the flexibility of the Fgf signaling pathway and it downstream mediators in regulating cellular events. This work is expected to be of broad interest to molecular and developmental biologists.

      Weaknesses:

      A weakness that needs to be discussed is that Le-Cre depends on Pax6 activation, and hence its use in specific gene deletion will not allow evaluation of the requirement of Fgfrs in the expression of Pax6 itself. But since this is the earliest Cre available for deletion in the lens, mentioning this in the discussion would make the readers aware of this issue.

      Reviewer #2 (Public review):

      Summary

      I have reviewed the revised manuscript submitted by Wang et al., which is entitled "Shc1 cooperates with Frs2 and Shp2 to recruit Grb2 in FGF-induced lens development". In this paper, the authors first examined lens phenotypes in mice with Le-Cre-mediated knockdown (KD) of all four FGFR (FGFR1-4), and found that pERK signals, Jag1 and foxe3 expression are absent or drastically reduced, indicating that FGF signaling is essential for lens induction. Next, the authors examined lens phenotypes of FGFR1/2-KD mice and found that lens fiber differentiation is compromised and that proliferative activity and cell survival are also compromised in lens epithelium. Interestingly, Kras activation rescues defects in lens growth and lens fiber differentiation in FGFR1/2-KD mice, indicating that Ras activation is a key step for lens development, downstream of FGF signaling. Next, the authors examined the role of Frs2, Shp2 and Grb2 in FGF signaling for lens development. They confirmed that lens fiber differentiation is compromised in FGFR1/3-KD mice combined with Frs2-dysfunctional FGFR2 mutants, which is similar to lens phenotypes of Grb2-KD mice. However, lens defects are milder in mice with Shp2YF/YF and Shp2CS mutant alleles, indicating that involvement of Shp2 is limited for the Grb2 recruitment for lens fiber differentiation. Lastly, the authors showed new evidence on the possibility that another adapter protein, Shc1, promotes Grb2 recruitment independent of Frs2/Shp2-mediated Grb2 recruitment.

      Strength

      Overall, the manuscript provides valuable data on how FGFR activation leads to Ras activation through the adapter platform of Frs2/Shp2/Grb2, which advances our understanding on complex modification of FGF signaling pathway. The authors applied a genetic approach using mice, whose methods and results are valid to support the conclusion. The discussion also well summarizes the significance of their findings.

      Weakness

      The authors found that the new adaptor protein Shc1 is involved in Grb2 recruitments in response to FGF receptor activation. However, the main data on Shc1 are only histological sections and statistical evaluation of lens size. In the revised manuscript, the authors did not answer my major concern that cellular-level data are missing, which is not fully enough to support their main conclusion on the involvement of Shc1 in Grb2 recruitment of FGF signaling for lens development. Since the title of this manuscript is that Shc1 cooperates with Frs2 and Shp2 to recruit Grb2 in FGF-induced lens development, it is important to provide the cellular-level evidence on Shc1.

      Reviewer #3 (Public review):

      Summary:

      The manuscript entitled "Shc1 cooperates with Frs2 and Shp2 to recruit Grb2 in FGF-induced lens development" by Wang et al., investigates the molecular mechanism used by FGFR signaling to support lens development. The lens has long been known to depend on FGFR-signaling for proper development. Previous investigations have demonstrated the FGFR signaling is required for embryonic lens cell survival and for lens fiber cell differentiation. The requirement of FGFR signaling for lens induction has remained more controversial as deletion of both Fgfr1 and Fgfr2 during lens placode formation does not prevent the induction of definitive lens markers such as FOXE3 or αA-crystallin. Here the authors have used the Le-Cre driver to delete all four FGFR genes from the developing lens placode demonstrating a definitive failure of lens induction in the absence of FGFR-signaling. The authors focused on FGFR1 and FGFR2, the two primary FGFRs present during early lens development and demonstrated that lens development could be significantly rescued in lenses lacking both FGFR1 and FGFR2 by expressing a constitutively active allele of KRAS. They also showed that the removal of pro-apoptotic genes Bax and Bak could also lead to a substantial rescue of lens development in lenses lacking both FGFR1 and FGFR2. In both cases, the lens rescue included both increased lens size and the expression of genes characteristic of lens cells.

      Significantly the authors concentrated on the juxtamembrane domain, a portion of the FGFRs associated with FRS2. Previous investigations have demonstrated the importance of FRS2 activation for mediating a sustained level of ERK activation. FRS2 is known to associate both with GRB2 and SHP2 to activate RAS. The authors utilized a mutant allele of Fgfr1, lacking the entire juxtamembrane domain (Fgfr1ΔFrs) and an allele of Fgfr2 containing two-point mutations essential for Frs2 binding (Fgfr2LR). When combining three floxed alleles and leaving only one functional allele (Fgfr1ΔFrs or Fgfr2LR) the authors got strikingly different phenotypes. When only the Fgfr1ΔFrs allele was retained, the lens phenotype matched that of deleting both Fgfr1 and Fgfr2. However, when only the Fgfr2LR allele was retained the phenotype was significantly milder, primarily affecting lens fiber cell differentiation, suggesting that something other than FRS2 might be interacting with the juxtamembrane domain to support FGFR signaling in the lens. The authors also deleted Grb2 in the lens and showed that the phenotype was similar to that of the lenses only retaining the Fgfr2LR allele, resulting a failure of lens fiber cell differentiation and decreased lens cell survival. However, mutating the major tyrosine phosphorylation site of GRB2 did not affect lens development. The authors additionally investigated the role of SHP2 in lens development by either deleting SHP2 or by making mutations in the SHP2 catalytic domain. The deletion of the SHP2 phosphatase activity did not affect lens development as severely as total loss of SHP2 protein, suggesting a function for SHP2 outside of its catalytic activity. Although the loss of Shc1 alone has only a slight effect on lens size and pERK activation in the lens, the authors showed that the loss of Shc1 exacerbated the lens phenotype in lenses lacking both Frs2 and Shp2. The authors suggest that SHC1 binds to the FGFR juxtamembrane domain allowing for the recruitment of GRB2 in independently of FRS2.

      Strengths:

      (1) The authors used a variety of genetic tools to carefully dissect the essential signals downstream of FGFR signaling during lens development.

      (2) The authors made a convincing case that something other than FRS2 binding mediates FGFR signaling in the juxtamembrane domain.

      (3) The authors demonstrated that despite the requirement of both the adaptor function and phosphatase activity of SHP2 are required for embryonic survival, neither of these activities is absolutely required for lens development.

      (4) The authors provide more information as to why FGFR loss has a phenotype much more severe than the loss of FRS2 alone during lens development.

      (5) The authors followed up their work analyzing various signaling molecules in the context of lens development with biochemical analyses of FGF-induced phosphorylation in murine embryonic fibroblasts (MEFs).

      (6) In general, this manuscript represents a Herculean effort to dissect FGFR signaling in vivo with biochemical backing with cell culture experiments in vitro.

      Weaknesses:

      (1) The authors demonstrate that the loss of FGFR1 and FGFR2 can be compensated by a constitutive active KRAS allele in the lens and suggest that FGFRs largely support lens development only by driving ERK activation. However, the authors also saw that lens development was substantially rescued by preventing apoptosis through the deletion of BAK and BAX. To my knowledge, the deletion of BAK and BAX should not independently activate ERK. The authors do not show whether ERK activation is restored in the BAK/BAX deficient lenses. Do the authors suggest the FGFR3 and/or FGFR4 provide sufficient RAS and ERK activation for lens development when apoptosis is suppressed? Alternatively, is it the survival function of FGFR-signaling as much as a direct effect on lens differentiation?

      (2) Do the authors suggest that GRB2 is required for RAS activation and ultimately ERK activation? If so, do the authors suggest that ERK activation is not required for FGFR-signaling to mediate lens induction? This would follow considering that the GRB2 deficient lenses lack a problem with lens induction.

      (3) The increase in p-Shc is only slightly higher in the Cre FGFR1f/f FGFR2r/LR than in the FGFR1f/Δfrs FGFR2f/f. Can the authors provide quantification?

      (4) The authors have not shown directly that Shc1 binds to the juxtamembrane region of either Fgfr1 or Fgfr2.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      In the revised manuscript, the authors have responded to my recommendations to revise the original manuscript, except for three suggestions below.

      (1) The original recommendation: Results (page 6, line 8): The authors mentioned "we observed .... expression of Foxe3 in ...mutant lens cells (Figure 1E, arrows). However, Foxe3-expressing lens cells are a very small population in Figure 1E. It is important to state the decreased number of Foxe3-expressing lens cells in FGFR1/2 mutants. In addition, I would like to request the authors to show histograms indicating sample size and statistical analysis for marker expression: Foxe3 (Figure 1E), Prox1 and aA-crystallin (Fig. 1F), cyclin D1 and TUNEL (Fig. 1G) and pmTOR and pS6 (Supplementary figure 1B).

      Author's response: We added a statement indicating that the number of Foxe3-expressing cells is reduced in FGFR1/2 mutants, which is now quantified in Fig. 1H. Quantifications for Cyclin D1 and TUNEL are now shown in Fig. 1I and J, respectively. However, we chose not to quantify Prox1, αA-crystallin, pmTOR, and pS6, as the FGFR1/2 mutants showed no staining for these markers.<br /> My recommendation: Although the authors have responded to revise the quantification of Foxe3-expressing cells, Cyclin D1 and TUNEL, they did not conduct statistical analysis of Prox1, αA-crystallin, pmTOR, and pS6, because of absence of these marker signals. I understand that no signal makes statistical analysis no meaningful. However, it is still important to indicate how many the authors repeated experiments to confirm the same result. Please indicate the number of biological replicates or independent experiments in the figure legends, for example "Biological replicates, n=3" or "Three independent experiments show similar results". As for pS6 labeling, there seems to be a weak signal in Supplementary Figure 1B, so please show statistical analysis to indicate its histogram.

      We have added the number of biological replicates for Prox1 and αA staining in the legend of Fig.1. The review is correct that there is weak staining of pS6, and also pmTOR. The quantification of pS6 and pmTOR staining are now shown in Supplementary Fig. 1C and D.

      (2) The original recommendation: Results (page 6, line 19- page 7, line 6): The authors showed that inducible expression of constitutive active Kras, KrasG12D, using Le-Cre, recovered lens size to the half level of wild-type control. However, in the lens of mice with Le-Cre; FGFR1/2f/f; LSL-KrasG12D, pERK was detected in the most posterior edge of the lens fiber core, whereas pERK was detected in the broader area of the lens in control. Furthermore, pMEK was detected in the whole lens of mice with Le-Cre; FGFR1/2f/f; and LSL-KrasG12D, whereas pMEK was detected only in the lens epithelial cells at the equator. So, the spatial profile of pERK and pMEK expression was different from those of wild-type, although the authors observed that Prox1 and Crystallin expression are normally induced in the lens of mice with Le-Cre; FGFR1/2f/f; LSL-KrasG12D. I wonder whether the lens normally develops in mice with Le-Cre; LSL-KrasG12D? Is the lens growth enhanced in mice with Le-Cre; LSL-KrasG12D? Please add the panels of mice with Le-Cre; LSL-KrasG12D in Figure 2B and 2C. In addition, I wonder whether apoptosis is suppressed in the lens of mice with Le-Cre; FGFR1/2f/f; LSL-KrasG12D?

      Authors' response: Response: As we previously reported (Developmental Biology 355, 2011, 12-20), Le-Cre; LSL-KrasG12D did not lead to enhanced lens growth. While we agree that including images of Le-Cre; LSL-KrasG12D as controls in Fig. 2B and C and evaluating apoptosis in Le-Cre; FGFR1/2f/f; LSL-KrasG12D mutants would be appropriate, we regretfully no longer have these animals available to conduct these experiments.

      My recommendation: I would like to suggest the authors conduct these experiments again, because the recovery of lens formation by Bax/Bak KD in Fgfr1/2 KD mice (Fig. 2F) suggests that KrasG12D activates the AKT-mediated cell survival pathway as well as that MEK/MAPK pathway downstream of FGF signaling pathway. Regarding the availability of mouse strains, in general, it is necessary to keep animal strains available for sincere response to reviewers' suggestions. Please clarify why these strains are now not available and justify the reason in the response to reviewers' recommendations.

      We acknowledge the reviewer's suggested experiments. However, our research utilized multiple mouse strains that are costly to maintain, a challenge that was exacerbated during and after the COVID-19 pandemic. Unfortunately, we no longer have access to the specific mouse strains required to conduct these additional studies.

      (3) The original recommendation: Figures 7E, and 7F: The authors showed that lens morphology and lens size evaluation in genetic combinations: control, Frs2/Shc1 KD, Frs2/Shp2 KD, and Frs2/Shp2/Shc1 KD. However, I would like to request the authors to show more detailed data in these genetic combinations, for example, pERK, foxe3, Maf, Prox1, Jag1, p57, cyclin D3, g-crystallin, and TUNEL.

      Authors' response: Unfortunately, we no longer have these mutant mice to perform these detailed staining.

      My recommendation: As I mentioned in the statement on weakness above, it is important to provide the cellular-level evidence to support the main conclusion on the involvement of Shc1 in Grb2 recruitment of FGF signaling for lens development, because this is the main novel finding in this manuscript. Regarding the availability of mouse strains, it is generally necessary to keep animal strains available for sincere response to reviewers' suggestions. Please clarify why these strains are now not available and justify the reason in the response to the reviewers' suggestions.

      We regret that we did not anticipate these experiments suggested by the reviewer. Unfortunately, we are unable to perform these studies as we no longer maintain the required mouse strains in our colony.

      Reviewer #3 (Recommendations for the authors):

      The changes made by the authors improved the manuscript. I have no further suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Kong Fang et al describe a robust pipeline for the isolation of small extracellular vesicles through a combination of size exclusion chromatography and miniaturized density gradient separation. Subsequently, they prove that the method is reproducible and suitable for small-volume operations while at the same time not compromising the quality of vesicles.

      Strengths:

      The paper narrates a robust method for purifying high-quality sEVs from small amounts of blood plasma. They also demonstrate that through this approach, they can derive sEVs without compromising the protein composition, integrity of the vesicles, or contamination with other proteins or lipids.

      Weaknesses:

      The paper is a nice summary of how to enrich sEVs from blood samples. Although well performed and substantiated with data, the paper primarily deals with method development and optimisation.

      We agree with the reviewer's assessment that this paper primarily focuses on the development and optimization of a method. Using this robust technique for isolating small extracellular vesicles (sEVs) from small blood volumes, our future research will investigate sEVs isolated from clinical samples, with a particular focus on their role in various diseases.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors manage to optimize a simple and rapid protocol using SEC followed by DGCU to isolate sEVs with adequate purity and yield from small volumes of plasma. Isolated fractions containing sEVs using SEC, DGCU, SEC-DGCU, and DGCU-SEC are compared in terms of their yield, purity surface protein profile, and RNA content. Although the combined use of these methodologies has already been evaluated in previous works, the authors manage to adapt them for the use of small volumes of plasma, which allows working in 1.5 mL tubes and reducing the centrifugation time to 2 hours.

      The authors finally find that although both the SEC-DGCU and DGCU-SEC combinations achieve isolates with high purity, the SEC-DGCU combination results in higher yields.

      This work provides an interesting tool for the rapid obtention of sEVs with sufficient yield and purity for detailed characterization which could be very useful in research and clinical therapy.

      Strengths:

      - The work is well-written and organized.

      - The authors clearly state the problem they want to address, that is, optimizing a method that allows sEV to be isolated from small volumes of plasma.

      - Although these methodologies have been tested in previous works, the authors manage to isolate sEVs of high purity and good performance through a simple and fast methodology.

      - The characteristics of all isolated fractions are exhaustively analyzed through various state-of-the-art methodologies.

      - They present a good interpretation of the results obtained through the methodologies used.

      Weaknesses:

      - Lack of references that support some of the results obtained.

      - Although this work focuses on comparing different techniques and their combinations to find an optimal option, the authors do not use any statistical method that reliably shows the differences between these techniques, except when repeatability is measured.

      We appreciate the reviewer's insightful comments and will incorporate the suggested missing references. We acknowledge that we did not perform statistical analyses when comparing the differences among the three methods. Nevertheless, the superiority of the SEC-DGUC method is evident from observations based on several independent characterization methods, including Cryo-EM, TEM, western blot, and total RNA quantification.

      Firstly, repeated Cryo-EM observations consistently confirm that the SEC-alone method shows severe lipoprotein contamination while the SEC-DGUC method drastically reduces such lipoprotein contamination. In comparing the SEC-DGUC and DGUC-SEC methods, multiple independent characterization methods showed that the SEC-DGUC method yields significantly greater quantity of sEVs: 1) The western blot experiment showed much higher signal intensity for all four tested sEV markers (CD9, CD63, CD81, and TSG101), with estimated concentrations approximately 2.1, 2.1, 4.7, and 4.2 times higher than the DGUC-SEC method. 2) The total RNA analysis showed that SEC-DGUC-1 contained more than 4 times the total amount of RNA compared to DGUC-SEC-PF. 3) Establishing the normalization baseline, particle size distributions in SEC-DGUC-1 and DGUC-SEC-PF measured by TEM were found to be similar, suggesting comparable purity and distribution of the captured sEVs. For comparison purposes, within each independent characterization method, the same plasma source and total plasma volume were used, while across different methods, different plasma sources were used. These independent characterization methods have consistently demonstrated the superiority of the SEC-DGUC method over the DGUC-SEC or SEC-alone methods.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In my opinion, this work is elegantly designed and supported by data, which would motivate more studies related to blood-derived microvesicles in the context of infectious and systemic diseases. Overall, the manuscript is well-written and explained in sufficient detail. I only have minor comments.

      (1) Recruitment of volunteers for blood/plasma collection: there is a need for a statement that this was in accordance with ethical and biosafety regulations of the Institute/Clinic.

      We added two sentences at the beginning of the Blood Collection section (under Materials and methods): “All procedures involving peripheral blood specimens were approved by the Singapore National Health Group Domain Specific Review Board (the central ethics committee) and were mutually recognized by the Nanyang Technological University Institutional Review Board (IRB#2018/00671). All blood specimens were de-identified prior to their use in the experiments.”

      (2) Since this is a method development and validation article, it would be good to include an image of the iodixanol gradient with the high-density sEV zone, after centrifugation.

      We have incorporated an image after centrifugation in Supplementary Figure 3.

      (3) Although several sEV markers are shown in Figure 7A, flotillin is missing in this figure which was part of Figure 6B. Does flotillin show a different pattern? Flotillin is a DRM-associated marker, and hence may behave differently, would be interesting to add any insights.

      We appreciate the reviewer’s careful observation. In Figure 6B, Flotillin was used to confirm the presence of sEVs in different density zones. However, for the purpose of comparing the yield between the SEC-DGUC and DGUC-SEC methods, as shown in Figure 7A, Flotillin was not included in the western blot analysis. No obvious pattern changes were observed in other sEV markers tested in both Figures 6B and 7A.   

      (4) Methods section of LC/MS analysis- which protein database was used for protein identification?

      We added the following sentence at the end of the LC/MS analysis section: “The protein database used for protein identification was Uniprot Human.”

      Reviewer #2 (Recommendations For The Authors):

      In line 43 some references are needed.

      We added this reference: EL Andaloussi, S., Mäger, I., Breakefield, X. et al. Extracellular vesicles: biology and emerging therapeutic opportunities. Nat Rev Drug Discov 12, 347–357 (2013). https://doi.org/10.1038/nrd3978

      In line 107, please avoid using short forms such as "it's".

      We have revised that to “it is.”

      In line 153: "...separates low-density particles from those of high density, but a considerable amount of..." the word "but" should not be in the sentence.

      We have removed “but” in this sentence.

      In line 181 the authors establish that "Notably, SEC-PF exhibited a high level of ApoB and low expression of sEV markers." Is there any explanation for this?

      SEC-PF represents the eluate from the SEC step, collected before the DGUC step. This fraction contains a mixture of lipoproteins and sEVs. Due to the overwhelming abundance of lipoproteins compared to sEVs, the western blot predictably shows a high level of ApoB with minimal expression of sEV markers. This highlights that SEC alone effectively reduces plasma protein content but does not efficiently remove lipoproteins. Figure 6C further illustrates this point, as cryo-EM images of SEC-PF reveal the presence of sEVs, which are vastly outnumbered by lipoproteins.

      In line 198, the sentence "Theoretically, the DGUC-SEC protocol should also effectively isolate sEVs from plasma" need to be supported by references.

      See for instance:

      - Holcar M, Ferdin J, Sitar S, Tušek-Žnidarič M, Dolžan V, Plemenitaš A, Žagar E, Lenassi M. 2020. Enrichment of plasma extracellular vesicles for reliable quantification of their size and concentration for biomarker discovery. Sci Rep 10:21346. doi:10.1038/s41598-020-78422-y.

      - Jia Y, Yu L, Ma T, Xu W, Qian H, Sun Y, Shi H. 2022. Small extracellular vesicles isolation and separation: Current techniques, pending questions and clinical applications. Theranostics 12:6548-6575. doi:10.7150/thno.74305

      - Vergauwen G, Dhondt B, Van Deun J, De Smedt E, Berx G, Timmerman E, Gevaert K, Miinalainen I, Cocquyt V, Braems G, Van den Broecke R, Denys H, De Wever O, Hendrix A. 2017. Confounding factors of ultrafiltration and protein analysis in extracellular vesicle research. Sci Rep 7:2704. doi:10.1038/s41598-017-02599-y

      We have added this reference: Holcar M, Ferdin J, Sitar S, Tušek-Žnidarič M, Dolžan V, Plemenitaš A, Žagar E, Lenassi M. 2020. Enrichment of plasma extracellular vesicles for reliable quantification of their size and concentration for biomarker discovery. Sci Rep 10:21346. https://doi.org/10.1038/s41598-020-78422-y.  

      In line 309 the authors establish that "NTA measured size distributions displayed well-overlapped histograms of particles". It is possible for the authors to analyze this overlapping using some statistical test as a chi-squared test?

      We have conducted a statistical analysis of the histogram similarities using the Jensen-Shannon Divergence (JSD) method. This is reflected in the manuscript under the results section, “Repeatability and reliability of the SEC-DGUC protocol”, where we state: “We then compared size distributions for each plasma fraction using Jensen-Shannon Divergence (JSD). The JSD values, which are well below 0.1 (Figure 10B), indicate a consistent population of isolated particles, as further supported by Supplementary Figure 8.” Additionally, we included JSD values in the legend of Figure 10B: “JSD values for SEC-DGUC-1 to 4 are 0.015, 0.006, 0.001, and 0.002, indicating strong similarities among the histograms.” These additions demonstrate the robustness and repeatability of the SEC-DGUC protocol.

      In line 360, "lasts ~ 16 hours or more." This statement needs a reference that supports this time.

      We have added this reference: Vergauwen, G. et al. Robust sequential biophysical fractionation of blood plasma to study variations in the biomolecular landscape of systemically circulating extracellular vesicles across clinical conditions. J Extracell Vesicles 10, e12122 (2021).

      In line 399, the reference format is different from the previously used format.

      This is corrected. We thank the reviewer for this careful examination.

      Line 466: This sentence is not quite clear. It can be understood that for every 0.5 mL of plasma, 2 mL of particle fraction are obtained and that for 6 mL of plasma, this method will give a total volume of 24 mL. However, it is not clear what is meant by the fact that it has been concentrated to 6 mL. While one can assume that those final 6 mL concentrates come from the initial 24 mL, perhaps the way this sentence was worded was not appropriate. I would recommend rewriting it for a simpler interpretation of how this method was performed.

      We have changed the sentence to: “For the DGUC experiment using the 12 ml tube, 24 ml of PFs were obtained from 6 ml of plasma and subsequently concentrated to 6 ml. The 6 ml of concentrated PFs were then transferred to a Beckman Coulter ultra-clear centrifuge tube (344059, Beckman Coulter, USA) for further processing.”

      Line 519: The authors established a second dilution to avoid absorbance values above 1.2. Is there any justification for this value, taking into account that the Lambert-Beer law presents more precision in the absorbance range of 0.2 to 0.8?

      We have added this reference: https://diagnostic.serumwerk.com/wp-content/uploads/2021/05/V05-Serumwerk.pdf

      Line 519-520: "Also included were water and 0.25 M sucrose as blanks". Perhaps authors could consider rephrasing this sentence.

      We have changed the sentence to: “The absorbance measurements were made against water and 0.25 M sucrose blanks.”

      In line 520, the sentence must say "each sample was made by triplicate".

      We have changed the sentence to: “Each sample was prepared by triplicate to reduce error.” We thank the reviewer for this suggestion.

      Line 673: The phrase "0.1% formic acid in 100% ACN" would be better, in my opinion, if it said "0.1% formic acid in ACN".

      Yes, these two expressions have the same meaning. However, to ensure clarity, we have updated the description to “0.1% formic acid in ACN.”. We thank reviewer for this suggestion.

      Supplementary Figure 1: in the Figure caption there is an error in the numbering: at the end, where it is written (E), it should be (F). Please, correct this.

      We have made the necessary correction and sincerely appreciate the reviewer’s attentiveness.

      Supplementary Figure 5: Some sEVs are hard to visualize due to poor image resolution. Is there any possibility for the authors to enhance these images?

      We thank the reviewer for this valuable comment. To improve the visual clarity of the images, we have opted to display four sub-figures instead of nine.

    1. Author response:

      We appreciate the effort the reviewers have put into evaluating our work, and will take the opportunity to revise and improve our submission. In response to the reviewer's comments, we will carefully revisit our manuscript to address the concerns they have raised. Specifically, we will ensure that our revised version is coherent with our annotations and public databases, clarify any discrepancy between the investigated proteins and gene models, and re-examine our discussion of the evolutionary implications in light of their suggestions. We are confident that these revisions will strengthen our work and provide a clearer understanding of our research findings.

    1. Author response:

      We sincerely thank all three reviewers for their time, comments, and valuable suggestions, which will help improve our manuscript. Below, we provide preliminary remarks addressing some of the key issues that have been raised.

      Reviewer 1:

      We agree with the reviewer on the challenge of accurately mapping reads to multigene families. We carefully considered this issue and addressed it by evaluating the performance of multiple aligners using simulated RNA-seq reads. Our results indicate that kallisto performs particularly well in this context, outperforming widely used aligners such as Bowtie2 and STAR. This is likely due to kallisto’s expectation-maximization (EM) algorithm (described in the Materials and Methods section), which employs a probabilistic model to assign reads from similar transcripts. Previous studies have demonstrated the effectiveness of this approach in quantifying highly repetitive sequences, such as transposons (doi.org/10.1093/bioinformatics/btv422). In the revised manuscript, we are considering the inclusion of a supplementary figure to further support the selection of the mapping algorithm.

      Reviewer 2:

      We believe that obtaining experimental evidence on the influence of multiple multigene families would represent a significant advancement in the field. However, we would like to emphasize that this is a short communication centered on a specific and biologically relevant observation within a single multigene family. The manuscript is not intended to comprehensively address all aspects of the experiment but rather to highlight what we consider an important biological phenomenon with potential functional implications.

      The influence of phenotypic heterogeneity and its possible advantages under environmental pressures has been previously proposed for Trypanosoma cruzi, related trypanosomatids, and other biological systems, ranging from bacteria to tumors (Seco-Hidalgo 2015, doi: 10.1098/rsob.150190 and Luzak 2021, doi: 10.1146/annurev-micro-040821-012953, for a comprehensive review on this topic). While the reviewer is correct in noting that our model does not demonstrate a functional role for TcS heterogeneity, the experimental approaches required to address this question in a large multigene family are highly complex and beyond the scope of this study. However, we acknowledge the importance of clarifying that the proposed functional implications remain speculative, so we will revise the manuscript accordingly.

      As the reviewer suggests, in the revised version of the manuscript, we will include additional analyses on the characteristics of frequently expressed TcS genes to identify common features that may explain their expression patterns.

      We appreciate the reviewer’s comments and suggestions regarding the clarity of methodological choices and the explanation of key concepts. Accordingly, we will refine the description of our methodology and ensure that our figures are more intuitive and self-explanatory.

      Reviewer 3:

      We recognize the limitations imposed by gene dropout in our data, as highlighted by the reviewer. In the manuscript, we have aimed to be transparent about this issue and discussed its impact in two separate sections (lines 110–121 and 175–181). To enhance clarity, we will revise these paragraphs to provide a more comprehensive discussion of this limitation. Unfortunately, gene dropout is an inherent limitation of 10x genomics data. Trypanosomatids are not an exception in this regard, and the general metrics of the single-cell RNA-seq data in other reports are equivalent to those obtained in our experiment.

      Despite this important limitation, we believe that our comparative analyses (the contrast between TcS and ribosomal protein expression) provide valuable insights into a biological phenomenon with potential functional relevance for the parasite. Furthermore, we are actively working on generating single-cell RNA-seq data using alternative methodologies that improve gene dropout rates. We anticipate that these future studies will help clarify the extent of the phenomenon described in this work.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Liu et al., present glmSMA, a network-regularized linear model that integrates single-cell RNA-seq data with spatial transcriptomics, enabling high-resolution mapping of cellular locations across diverse datasets. Its dual regularization framework (L1 for sparsity and generalized L2 via a graph Laplacian for spatial smoothness) demonstrates robust performance of their model and offers novel tools for spatial biology, despite some gaps in fully addressing spatial communication.

      Overall, the manuscript is commendable for its comprehensive benchmarking across different spatial omics platforms and its novel application of regularized linear models for cell mapping. I think this manuscript can be improved by addressing method assumptions, expanding the discussion on feature dependence and cell type-specific biases, and clarifying the mechanism of spatial communication.

      The conclusions of this paper are mostly well supported by data, but some aspects of model development and performance evaluation need to be clarified and extended.

      We thank the reviewer for their thoughtful comments. We will clarify the model assumptions and the feature selection process to make it more understandable. To clarify, the performance of glmSMA does not depend on cell type. For some rare cell types, the small number of cells can lead to a drop in performance. To better illustrate our results and reduce cell type-specific biases, we will shuffle and randomly sample the cell types.

      (1) What were the assumptions made behind the model? One of them could be the linear relationship between cellular gene expression and spatial location. In complex biological tissues, non-linear relationships could be present, and this would also vary across organ systems and species. Similarly, with regularization parameters, they can be tuned to balance sparsity and smoothness adequately but may not hold uniformly across different tissue types or data quality levels. The model also seems to assume independent errors with normal distribution and linear additive effects - a simplification that may overlook overdispersion or heteroscedasticity commonly observed in RNA-seq data.

      Thank you for this comment. We acknowledge that the non-linear relationships can be present in complex tissues and may not be fully captured by a linear model. 

      Our choice of a linear model was guided by an investigation of the relationship in the current datasets, which include intestinal villus, mouse brain, and fly embryo.

      There is a linear correlation between expression distance and physical distance [Nitzan et al]. Within a given anatomical structure, cells in closer proximity exhibit more similar expression patterns. In tissues where non-linear relationships are more prevalent—such as the human PDAC sample—our mapping results remain robust. We acknowledge that we have not yet tested our algorithm in highly heterogeneous regions like the liver, and we plan to include such analyses in future work if necessary. Regarding the regularization parameters, we agree that the balance between sparsity and smoothness is sensitive to tissue-specific variation and data quality. In our current implementation, we explored a range of values to find robust defaults.

      (2) The performance of glmSMA is likely sensitive to the number and quality of features used. With too few features, the model may struggle to anchor cells correctly due to insufficient discriminatory power, whereas too many features could lead to overfitting unless appropriately regularized. The manuscript briefly acknowledges this issue, but further systematic evaluation of how varying feature numbers affect mapping accuracy would strengthen the claims, particularly in settings where marker gene availability is limited. A simple way to show some of this would be testing on multiple spatial omics (imaging-based) platforms with varying panel sizes and organ systems. Related to this, based on the figures, it also seems like the performance varies by cell type. What are the factors that contribute to this? Variability in expression levels, RNA quantity/quality? Biases in the panel? Personally, I am also curious how this model can be used similarly/differently if we have a FISH-based, high-plex reference atlas. Additional explanation around these points would be helpful for the readers.

      Thank you for this thoughtful comment. The performance of our method is indeed sensitive to the number and quality of selected features. To optimize feature selection, we employed multiple strategies, including Moran’s I statistic, identification of highly variable genes, and the Seurat pipeline to detect anchor genes linking the spatial transcriptomics data with the reference atlas. The number of selected markers depends on the quality of the data. For high-quality datasets, fewer than 100 markers are typically sufficient for accurate prediction. To address this more clearly, we will revise the manuscript to include detailed descriptions of our feature selection process and demonstrate how varying the number of selected features impacts performance.

      We evaluated our method across diverse tissue types and platforms—including Slide-seq, 10x Visium, and Virtual-FISH—which represent both sequencing-based and imaging-based spatial transcriptomics technologies. Our model consistently achieved strong performance across these settings. It's worth noting that the performance of other methods, such as CellTrek [Wei et al] and novoSpaRc [Nitzan et al], also depends heavily on feature selection. In particular, performance degrades substantially when fewer features are used.

      We do not believe that the observed performance is directly influenced by cell type composition. Major cell types are typically well-defined, and rare cell types comprise only a small fraction of the dataset. For these rare populations, a single misclassification can disproportionately impact metrics like KL divergence due to small sample size. However, this does not necessarily indicate a systematic cell type–specific bias in the mapping. To mitigate this issue, we will implement shuffling and sampling procedures to reduce potential bias introduced by rare cell types.

      (3) Application 3 (spatial communication) in the graphical abstract appears relatively underdeveloped. While it is clear that the model infers spatial proximities, further explanation of how these mappings translate into insights into cell-cell communication networks would enhance the biological relevance of the findings.

      Thank you for this valuable feedback. We agree that further elaboration on the connection between spatial proximity and cell–cell communication would enhance the biological interpretation of our results. While our current model focuses on inferring spatial relationships, we may provide some cell-cell communications in the future.

      (4) What is the final resolution of the model outputs? I am assuming this is dictated by the granularity of the reference atlas and the imposed sparsity via the L1 norm, but if there are clear examples that would be good. In figures (or maybe in practice too), cells seem to be assigned to small, contiguous patches rather than pinpoint single-cell locations, which is a pragmatic compromise given the inherent limitations of current spatial transcriptomics technologies. Clarification on the precise spatial scale (e.g., pixel or micrometer resolution) and any post-mapping refinement steps would be beneficial for the users to make informed decisions on the right bioinformatic tools to use.

      Thank you for the comment. For each cell, our algorithm generates a probability vector that indicates its likely spatial assignment along with coordinate information. We will include the resolution and the number of cells assigned to each spot in future versions. In our framework, each cell is mapped to one or more spatial locations with associated probabilities. Depending on the amount of regularization through L1 and L2 norms, a cell may be localized to a small patch or distributed over a broader domain. For the 10x Visium data, we applied a repelling algorithm to enhance visualization [Wei et al]. If a cell’s original location is already occupied, it is reassigned to a nearby neighborhood to avoid overlap. The users can also see the entire regularization path by varying the penalty terms. 

      Nitzan M, Karaiskos N, Friedman N, Rajewsky N. Gene expression cartography. Nature. 2019;576(7785):132-137. doi:10.1038/s41586-019-1773-3

      Wei, R. et al. (2022) ‘Spatial charting of single-cell transcriptomes in tissues’, Nature Biotechnology, 40(8), pp. 1190–1199. doi:10.1038/s41587-022-01233-1. 

      Reviewer #2 (Public review):

      Summary:

      The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.

      Thank you for recognizing our contribution. Our goal was to develop a method that achieves higher spatial resolution in mapping single-cell data compared to existing tools. We are encouraged by the results and will continue to refine the approach to improve accuracy and generalizability across platforms and tissue types.

      Strengths:

      The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.

      Thank you for this comment. We believe that evaluating our method across diverse tissue types—such as the mouse cortex, human PDAC, and intestinal villus—demonstrates its robustness and broad applicability. We plan to continue expanding these evaluations to additional tissue contexts and species to further validate the method’s generalizability.

      Weakness:

      (1) Although the researchers claim that glmSMA seamlessly accommodates both sequencing-based and image-based spatial transcriptomics (ST) data, their testing primarily focused on sequencing-based ST data, such as Visium and Slide-seq. To demonstrate its versatility for spatial analysis, the authors should extend their evaluation to imaging-based spatial data.

      Thank you for the comment. We have tested our algorithm on the virtual FISH dataset from the fly embryo, which serves as an example of image-based spatial omics data. However, such datasets often contain a limited number of available genes. To address this, we will conduct additional testing on image-based data if needed. The Allen Brain Atlas provides high-quality ISH data, and we can select specific brain regions from this resource to further evaluate our algorithm if necessary [Lein et al]. Currently, we plan to focus more on the 10x Visium platform, as it supports whole-transcriptome profiling and offers a wide range of tissue samples for analysis.

      (2) The definition of "ground truth" for spatial distribution is unclear. A more detailed explanation is needed on how the "ground truth" was established for each spatial dataset and how it was utilized for comparison with the predicted distribution generated by various spatial mapping tools.

      Thank you for the comment. To clarify how ground truth is defined across different tissues, we provide the following details. Direct ground truth for cell locations is often unavailable in scRNA-seq data due to experimental constraints. To address this, we adopted alternative strategies for estimating ground truth in each dataset:

      - 10x Visium Data: We used the cell type distribution derived from spatial transcriptomics (ST) data as a proxy for ground truth. We then computed the KL divergence between this distribution and our model's predictions for performance assessment.

      - Slide-seq Data: We validated predictions by comparing the expression of marker genes between the reconstructed and original spatial data.

      - Fly Embryo Data: We used predicted cell locations from novoSpaRc as a reference for evaluating our algorithm.

      These strategies allowed us to evaluate model performance even in the absence of direct cell location data. In addition, we can apply multiple evaluation strategies within a single dataset.

      (3) In the analysis of spatial mapping results using intestinal villus tissue, only Figure 3d supports their findings. The researchers should consider adding supplemental figures illustrating the spatial distribution of single cells in comparison to the ground truth distribution to enhance the clarity and robustness of their investigation.

      Thank you for the comment. We will include additional details for this dataset in the supplementary figures. As the intestinal villus is a relatively simple tissue, most existing algorithms performed well on it. For this reason, we did not initially provide extensive details in the main text.

      (4) The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus. However, the original anatomical regions are not displayed, making it difficult to directly compare them with the predicted mapping results. Providing ground truth distributions for each tested tissue would enhance clarity and facilitate interpretation. For instance, in Figure 2a and Supplementary Figures 1 and 2, only the predicted mapping results are shown without the corresponding original spatial distribution of regions in the mouse cortex. Additionally, in Figure 3c, four anatomical regions are displayed, but it is unclear whether the figure represents the original spatial regions or those predicted by glmSMA. The authors are encouraged to clarify this by incorporating ground truth distributions for each tissue.

      Thank you for the comment. To improve visualization, we will include anatomical structures alongside the mapping results in the next version, wherever such structures are available (e.g., mouse brain cortex, human PDAC sample, etc.). Regions will be color-coded to enhance clarity and make the spatial organization easier to interpret.

      (5) The cell assignment results from the mouse hippocampus (Supplementary Figure 6) lack a corresponding ground truth distribution for comparison. DG and CA cells were evaluated solely based on the gene expression of specific marker genes. Additional analyses are needed to further validate the robustness of glmSMA's mapping performance on Slide-seq data from the mouse hippocampus.

      Thank you for the comment. The ground truth for DG and CA cells was not available. To better evaluate the model's performance, we will compute the KL divergence between the original and predicted cell type distributions, following the same approach used for the 10x Visium dataset.

      (6) The tested spatial datasets primarily consist of highly structured tissues with well-defined anatomical regions, such as the brain and intestinal villus. Anatomical regions are not distinctly separated, such as liver tissue. Further evaluation of such tissues would help determine the method's broader applicability.

      Thank you for the comment. We have already tested our algorithm on the fly embryo, where anatomical structures are not well defined or clearly separated. If needed, we can further apply glmSMA to more complex tissues such as the liver. To clarify the role of anatomical structures in our model: glmSMA does not require anatomical information as input. Instead, it leverages a distance matrix between cells to apply L2 norm regularization. Despite the absence of anatomical information, the model still demonstrates strong performance. We will include results to illustrate its effectiveness without anatomical input. Additionally, we plan to evaluate the model on tissues where anatomical regions are not clearly delineated.

      Lein, E., Hawrylycz, M., Ao, N. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007). https://doi.org/10.1038/nature05453

      Reviewer #3 (Public review):

      Summary:

      The authors aim to develop glmSMA, a network-regularized linear model that accurately infers spatial gene expression patterns by integrating single-cell RNA sequencing data with spatial transcriptomics reference atlases. Their goal is to reconstruct the spatial organization of individual cells within tissues, overcoming the limitations of existing methods that either lack spatial resolution or sensitivity.

      Strengths:

      (1) Comprehensive Benchmarking:

      Compared against CellTrek and Novosparc, glmSMA consistently achieved lower Kullback-Leibler divergence (KL divergence) scores, indicating better cell assignment accuracy.

      Outperformed CellTrek in mouse cortex mapping (90% accuracy vs. CellTrek's 60%) and provided more spatially coherent distributions.

      (2) Experimental Validation with Multiple Real-World Datasets:

      The study used multiple biological systems (mouse brain, Drosophila embryo, human PDAC, intestinal villus) to demonstrate generalizability.

      Validation through correlation analyses, Pearson's coefficient, and KL divergence support the accuracy of glmSMA's predictions.

      We thank reviewer #3 for their positive feedback and thoughtful recommendations.

      Weaknesses:

      (1) The accuracy of glmSMA depends on the selection of marker genes, which might be limited by current FISH-based reference atlases.

      We agree that the accuracy of glmSMA is influenced by the selection of marker genes, and that current FISH-based reference atlases may offer a limited gene set. To address this, we incorporate multiple feature selection strategies, including highly variable genes and spatially informative genes (e.g., via Moran’s I), to optimize performance within the available gene space. As more comprehensive reference atlases become available, we expect the model’s accuracy to improve further.

      (2) glmSMA operates under the assumption that cells with similar gene expression profiles are likely to be physically close to each other in space which not be true under various heterogeneous environments.

      While this assumption effectively captures spatial continuity in many cases, we acknowledge that it may not hold across all biological contexts. To address this, we plan to refine our regularization strategy and evaluate the model's performance in heterogeneous tissue regions.

    1. Author response:

      Public Reviews

      Reviewer #1 (Public review):

      Summary:

      Kwon et al present a very well-conducted and well-written sieve analysis of rotavirus infections in a passive surveillance network in the US, considering how relative vaccine efficacy changes with genetic distance from the vaccine strains including the whole genome. The results are compelling, supported by a number of sensitivity analyses, and the manuscript is generally easy to follow.

      Strengths:

      (1) The underlying study base, a surveillance network across multiple sites in the US.

      (2) The use of a test-negative design, which is well established for rotavirus, to estimate vaccine efficacy.

      (3) The use of genetic distance to measure differences between infecting and vaccine strains, and the innovative use of k-means clustering to make results more interpretable.

      (4) The secondary and sensitivity analyses that provide additional context and support for the primary findings.

      Weaknesses:

      (1) As identified by the authors, there is a limited sample size for the analysis of RV1 (monovalent rotavirus vaccine).

      (2) Sieve analyses were originally designed for randomized trials, in which setting their key assumptions are more likely to be met. There is little discussion in this paper of how those assumptions might be violated and what effect that might have on the results. The authors have access to some important confounders, but I believe some more discussion on potential biases in this observational study is warranted.

      We appreciate the reviewer’s positive comments and the opportunity to discuss the application of sieve analysis in observational vaccine effectiveness studies, contrasting it with its traditional use in clinical trials assessing vaccine efficacy. We fully acknowledge the reviewer's point that sieve analysis was originally developed for, and is most frequently employed in, randomized controlled trials (RCTs).

      Sieve analysis, as defined by Gilbert et al. (2001), has the following core assumptions: (A1) uniform susceptibility to infection for all participants except for vaccine-induced strain-specific effects; (A2) equal exposure (for each strain s = 1,…,K ) distribution between vaccine groups; and (A3), constant strain prevalence. RCTs ensure these through randomization. However, our observational design is vulnerable to violating these assumptions, especially A1 and A3. To address A1 and A3, we adjusted for age (in years), sample collection year, and clinical setting (i.e., outpatient, inpatient, ED), aiming to account for both individual-level and temporal variations.

      A2 is particularly challenging in observational settings. We found that study site was correlated with both vaccination status (main predictor) and the strain distribution, potentially violating A2. However, adjusting for study site reversed the expected association. Upon further reflection, we realized that the site-specific differences in strain distributions likely reflect the population-level effect of vaccination, which we believe outweighs the potential confounding by study site as an independent cause of both individual-level vaccination status and strain distributions irrespective of vaccination. Thus, adjusting for site would have obscured this genuine population-level effect, and therefore we elected not to do so. We will include further discussion of this point in the revised manuscript.

      Our study demonstrates the unique capacity of sieve analysis to disentangle individual- and population-level effects on vaccine effectiveness in observational settings. We will expand on these considerations, including the potential biases inherent to observational studies and the rationale for our analytical choices, within the discussion section of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study introduces a new metric for assessing the efficacy of rotavirus vaccines through the genetic distance clustering of strains. The authors analyzed variations in vaccine protection using whole genome sequencing.

      Strengths:

      Evaluating vaccine efficacy using whole genome sequencing can enhance our understanding of how pathogen evolution influences disease transmission and control.

      Weaknesses:

      While the study proposed a new method for evaluating vaccine efficacy using genetic information, its weaknesses arise from the insufficient evidence that analyses based on whole genome sequencing are more reliable than those that rely solely on VP7 and VP4 genotypes.

      Though most cases received the RV5 vaccine (n=119 compared to n=30 for RV1), Figure 2 and the primary focus of the paper concentrate on RV1, as the authors identified a stronger association with genetic distance.

      Additionally, it is unclear whether the difference between the two groups (j=0 versus j=1) is statistically significant for the analysis based on genetic distance to the RV1 strain, as well as for that based on minimum genetic distance to any of the RV5 vaccine strains. In both cases, the confidence intervals show substantial overlap

      The authors do not seem to have used a criterion for model selection based on the number of clusters; therefore, k=2 may not represent the optimal number of clusters, particularly in relation to the genetic distance associated with the RV5 vaccine (Figure 1B), which does not appear to show a bimodal distribution.

      Finally, outcomes for RV1 are highly associated with both homotypic and heterotypic antibody responses (Supplemental Figure 10), which have already been shown to impact vaccine effectiveness (The Pediatric Infectious Disease Journal 40(12):p 1135-1143, 2021, doi:10.1097/INF.0000000000003286). Given this strong association, the benefit of using genetic distance is unclear, as the GxPx genotype serves as a good proxy for genetic similarity. 

      We sincerely appreciate reviewer's careful consideration of our manuscript and their constructive suggestions for improvement.

      Regarding the comparison of whole-genome sequencing with traditional VP7/VP4 genotyping, we concur that a more explicit comparison would strengthen our findings. To this end, we plan to incorporate the direct comparison of genetic distance (GD) and genotype-specific vaccine effectiveness (VE) analyses into the main text. Additionally, we will conduct an analysis of VE based on homotypic, partially heterotypic, and fully heterotypic genotype groupings. This will provide a clearer demonstration of the potential added value of GD in refining VE estimates, particularly for future applications. Given the potential for reassortment among the rotavirus gene segments, our analysis highlights that relying solely on the VP7/VP4 genotype can at times be misleading. 

      Regarding k-means clustering, we wish to clarify that the selection of k=2 was not arbitrary. It was determined using the elbow method on the total within-sum-of-squares (using the fviz_nbclust function in the factoextra R package, with n=5000 bootstrapping). While we acknowledge that other methods, such as silhouette and gap statistics, may yield different optimal cluster numbers, we prioritized maximizing group sample sizes. We will explicitly state this model selection criterion within the methods section of the revised manuscript.

      We acknowledge the reviewer’s concern regarding the overlapping confidence intervals and the statistical significance of the differences between the VE for the j=0 and j=1 groups. One way to address this would be to modify our analysis. Instead of two separate logistic regression models (controls vs j=0 cases, and controls vs j=1 cases), we could employ a multinomial logistic regression model with three categories: controls (reference), j=0 cases, and j=1 cases, then conduct Wald test to directly compare the regression slopes for the j=0 and j=1 cases against controls. We intend to explore this approach in the revised manuscript, which will provide a more rigorous assessment of differences in VE by accounting for the relationship between groups within a single model.

      Reviewer #3 (Public review):

      Overall, this is an outstanding paper. It presents a novel approach to estimating rotavirus vaccine efficacy; is clearly written and presented; and has implications for this vaccine specifically as well as type-specific vaccine evaluation more generally. The analytical framework is a creative and there is rigorous use of data and statistical approaches. It has long been argued that rotavirus immunity/vaccine performance operates beyond the scale of G/P genotyping. This paper is the first to demonstrate that convincingly, using data on all 11 viral genes and whole genome sequence analysis. I have only minor comments that I recommend should be addressed.

      We sincerely thank the reviewer for their highly positive assessment of our manuscript. We will carefully address their minor comments and incorporate their recommendations in the revised manuscript, which we believe will further enhance the clarity and impact of our study.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper by Fournier et al. investigates the sensitivity of neural circuits to changes in intrinsic and synaptic conductances. The authors use models of the stomatogastric ganglion (STG) to compare how perturbations to intrinsic and synaptic parameters impact network robustness. Their main finding is that changes to intrinsic conductances tend to have a larger impact on network function than changes to synaptic conductances, suggesting that intrinsic parameters are more critical for maintaining circuit function.

      The paper is well-written and the results are compelling, but I have several concerns that need to be addressed to strengthen the manuscript. Specifically, I have two main concerns:

      (1) It is not clear from the paper what the mechanism is that leads to the importance of intrinsic parameters over synaptic parameters.

      (2) It is not clear how general the result is, both within the framework of the STG network and its function, and across other functions and networks. This is crucial, as the title of the paper appears very general.

      I believe these two elements are missing in the current manuscript, and addressing them would significantly strengthen the conclusions. Without a clear understanding of the mechanism, it is difficult to determine whether the results are merely anecdotal or if they depend on specific details such as how the network is trained, the particular function being studied, or the circuit itself. Additionally, understanding how general the findings are is vital, especially since the authors claim in the title that "Circuit function is more robust to changes in synaptic than intrinsic conductances," which suggests a broad applicability.

      I do not wish to discourage the authors from their interesting result, but the more we understand the mechanism and the generality of the findings, the more insightful the result will be for the neuroscience community.

      Major comments

      (1) Mechanism

      While the authors did a nice job of describing their results, they did not provide any mechanism for why synaptic parameters are more resilient to changes than intrinsic parameters. For example, from Figure 5, it seems that there is mainly a shift in the sensitivity curves. What is the source of this shift? Can something be changed in the network, the training, or the function to control it? This is just one possible way to investigate the mechanism, which is lacking in the paper.

      (2) Generality of the results within the framework of the STG circuit

      (a) The authors did show that their results extend to multiple networks with different parameters (the 100 networks). However, I am still concerned about the generality of the results with respect to the way the models were trained. Could it be that something in the training procedure makes the synaptic parameters more robust than intrinsic parameters? For example, the fact that duty cycle error is weighted as it is in the cost function (large beta) could potentially affect the parameters that are more important for yielding low error on the duty cycle.

      (b) Related to (a), I can think of a training scheme that could potentially improve the resilience of the network to perturbations in the intrinsic parameters rather than the synaptic parameters. For example, in machine learning, methods like dropout can be used to make the network find solutions that are robust to changes in parameters. Thus, in principle, the results could change if the training procedure for fitting the models were different, or by using a different optimization algorithm. It would be helpful to at least mention this limitation in the discussion.

      (3) Generality of the function

      The authors test their hypothesis based on the specific function of the STG. It would be valuable to see if their results generalize to other functions as well. For example, the authors could generate non-oscillatory activity in the STG circuit, or choose a different, artificial function, maybe with different duty cycles or network cycles. It could be that this is beyond the scope of this paper, but it would be very interesting to characterize which functions are more resilient to changes in synapses, rather than intrinsic parameters. In other words, the authors might consider testing their hypothesis on at least another 'function' and also discussing the generality of their results to other functions in the discussion.

      (4) Generality of the circuit

      The authors have studied the STG for many years and are pioneers in their approach, demonstrating that there is redundancy even in this simple circuit. This approach is insightful, but it is important to show that similar conclusions also hold for more general network architectures, and if not, why. In other words, it is not clear if their claim generalizes to other network architectures, particularly larger networks. For example, one might expect that the number of parameters (synaptic vs intrinsic) might play a role in how resilient the function is with respect to changes in the two sets of parameters. In larger models, the number of synaptic parameters grows as the square of the number of neurons, while the number of intrinsic parameters increases only linearly with the number of neurons. Could that affect the authors' conclusions when we examine larger models?

      In addition, how do the authors' conclusions depend on the "complexity" of the non-linear equations governing the intrinsic parameters? Would the same conclusions hold if the intrinsic parameters only consisted of fewer intrinsic parameters or simplified ion channels? All of these are interesting questions that the authors should at least address in the discussion.

      We thank Reviewer #1 for their valuable input. We agree with the reviewer that generality of the results may have been overstated. To address this we changed the title of the manuscript to make it more specific to rhythmic circuits and we included a sentence to this effect in the discussion. 

      (1) We were more interested in knowing which set of conductances is more robust in a population of models, rather than a mechanism. If such a mechanism exists it will be the subject of a different study.

      (2) (a) It is impossible to explore the whole parameter space of these models. Our method to find circuits will leave subsets of circuits out of the study. Our sole goal in constructing the model database was that the activities were similar but the conductances were different.  (b) Of course one could devise a cost function targeting circuits that are more or less robust to changes in one parameter. Whether those exist is a different matter. This is not what we intended to do.

      (3) For this we would need a different circuit that produces non-oscillatory activity. A normal pyloric rhythm circuit always produces oscillatory activity unless it is “crashed"either by temperature or perturbations, but even in this case because we don’t have a proper “control” activity (circuits crash in different ways) we would not be able to utilize the same approach.

      We think it is a valuable idea to perform a similar study in another small circuit with nonoscillatory (or rhythmic) activities. 

      (4) We did not explore the issue of how our results generalize to larger networks as it would be pure speculation. It could be potentially interesting to do a similar sensitivity analysis with a large network trained to perform a simple task. Our understanding is that many large trained networks are extremely sensitive to perturbations in synaptic weights, at the same time that the intrinsic properties of neurons in ANN are typically oversimplified and identical across units. 

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents an important exploration of how intrinsic and synaptic conductances affect the robustness of neural circuits. This is a well-deserved question, and overall, the manuscript is written well and has a logical progression.

      The focus on intrinsic plasticity as a potentially overlooked factor in network dynamics is valuable. However, while the stomatogastric ganglion (STG) serves as a well-characterized and valuable model for studying network dynamics, its simplified structure and specific dynamics limit the generalizability of these findings to more complex systems, such as mammalian cortical microcircuits.

      Strengths:

      Clean and simple model. Simulations are carefully carried out and parameter space is searched exhaustively.

      Weaknesses:

      (1) Scope and Generalizability:

      The study's emphasis on intrinsic conductance is timely, but with its minimalistic and unique dynamics, the STG model poses challenges when attempting to generalize findings to other neural systems. This raises questions regarding the applicability of the results to more complex circuits, especially those found in mammalian brains and those where the dynamics are not necessarily oscillating. This is even more so (as the authors mention) because synaptic conductances in this study are inhibitory, and changes to their synaptic conductances are limited (as the driving force for the current is relatively low).

      (2) Challenges in Comparison:

      A significant challenge in the study is the comparison method used to evaluate the robustness of intrinsic versus synaptic perturbations. Perturbations to intrinsic conductances often drastically affect individual neurons' dynamics, as seen in Figure 1, where such changes result in single spikes or even the absence of spikes instead of the expected bursting behavior. This affects the input to downstream neurons, leading to circuit breakdowns. For a fair comparison, it would be essential to constrain the intrinsic perturbations so that each neuron remains within a particular functional range (e.g., maintaining a set number of spikes). This could be done by setting minimal behavioral criteria for neurons and testing how different perturbation limits impact circuit function.

      (3) Comparative Metrics for Perturbation:

      Another notable issue lies in the evaluation metrics for intrinsic and synaptic perturbations. Synaptic perturbations are straightforward to quantify in terms of conductance, but intrinsic perturbations involve more complexity, as changes in maximal conductance result in variable, nonlinear effects depending on the gating states of ion channels. Furthermore, synaptic perturbations focus on individual conductances, while intrinsic perturbations involve multiple conductance changes simultaneously. To improve fairness in comparison, the authors could, for example, adjust the x-axis to reflect actual changes in conductance or scale the data post hoc based on the real impact of each perturbation on conductance. For example, in Figure 6, the scale of the panels of the intrinsic (e.g., g_na-bar) is x500 larger than the synaptic conductance (a row below), but the maximal conductance for sodium hits maybe for a brief moment during every spike and than most of the time it is close to null. Moreover, changing the sodium conductance over the range of 0-250 for such a nonlinear current is, in many ways, unthinkable, did you ever measure two neurons with such a difference in the sodium conductance? So, how can we tell that the ranges of the perturbations make a meaningful comparison?

      We thank Reviewer #2 for their comments. We agree with both reviewers about scope and generalizability. We changed the title of the manuscript and included a sentence in the discussion to address this. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 63: Tau_b is tau in Fig 1B? What is the 'network period' tau_n? Both are defined in the methods, but it would be good to clarify here and also in the figure.

      This was fixed. Tau_b is the  bursting period and we indicated it in the figure. Network period means the period of the network activity. This was rewritten.  

      (2) Line 74: "maximal conductances g_i." What is i? I can imagine what you meant, but it would be good to clarify the notation.

      There are multiple different currents. Letter ‘i' is an index over the different types. It now reads as follows,

      "The activity of the network depends on the values of the maximal conductances g ̄ i, where i is an index corresponding to the different current types (Na,CaS,CaT,Kd,KCa,A,H,Leak IMI)"

      (3) Line 78: "conductances are changed by a random amount." How much is the "random amount"? In percentages? 

      We fixed this sentence. This is how it reads now, 

      "The blue trace in Figure 1C corresponds to the activity of the same model when each  of the intrinsic conductances is changed by a random amount within a range between 0  (completely removing the conductance) and twice its starting value, 2×gi, or equivalently, an increment of 100%."

      Similarly, in Line 87: "by a similar percent." Can you provide Figures 1E-F in percentages? Are the percentages the same?

      The phrase "by a similar percent.” Is misleading and unimportant. Thank you, we removed it. 

      (4) Line 113: Why did you add I_MI? Is it important for the results or for the conclusions?

      I_MI was added because the current is known to be there and it is not more or less important for the results or conclusions than any other current. 

      (5) Line 117: "We used a genetic algorithm to generate a database." Confusing. I guess you meant that you used genetic algorithms to optimize the cost function.

      Thank you for this comment. We fixed this sentence, see below. 

      “We used a genetic algorithm to optimize the cost function, and in this way generated a database of N = 100 models with different values of maximal conductances (Holland 88)."

      (6) Line 136: "The models in the database were constrained to produce solutions whose features were similar to the experimental measurements." Why are there differences in the features? Is this an optimization issue? I thought you wanted to claim that there are degenerate solutions, that is, solutions where the parameters are different, but the output is identical. Please clarify.

      The concept of degenerate solutions does not imply that the solutions are mathematically identical. In biology this means that they provide very similar functions, but do so with different underlying parameters (in this case, maximal conductances). The activity of the pyloric network is slightly different across animals, and it also changes over time within the same individual. Variation across models reflects individual variation in the biological circuit, and it is strength of our modeling approach. The function of the circuits are equally good because they produce biologically realistic patterns, although the details of the activity patterns show differences. 

      (7) Line 139: "distributed (p > 0.05)." What test did you use? N? Similarly, at Lines 218, 241, 239, etc. Please be more rigorous when reporting statistical tests.

      Thank you. We now specify the test we utilized every time we report a p value. 

      (8) Line 143: "In this case, it is not possible to identify clusters, suggesting that there are no underlying relationships between the features in the model database." The 2D plot is misleading, as the features are in 11 dimensions. Claims should be about the 11D space, not projections onto 2D. In fact, I don't think you can rule out correlations between the features based on the 2D plots. For example, shouldn't there be correlations between the on and off phases and the burst durations?

      Thank you. These sentences were confusing and were removed. We added the following sentence to the end of that paragraph.

      "Because the feature vectors are similar, their t-SNE projections do not form groups or clusters."

      (9) Related to this, I don't understand this sentence: "Even though the conductances are broadly distributed over many-fold ranges, the output of the circuits results in tight yet uncorrelated distributions.”

      This sentence is confusing and was removed. 

      (10) Line 158: Repetition of Line 152: Figure 3 shows the currentscapes of each cell in two model networks.

      We removed the second instance of the repeated sentences. 

      (11) Line 160: "yet the activity of the networks is similar." Well, they are similar, but not identical. I can also say that the current scapes are 'similar'. This should be better quantified and not left as a qualitative description.

      While this is an interesting point it will not change the results and conclusions of the present study. The network models are different since the values of their maximal conductances are distributed over wide ranges.  

      (12) Line 218: midpoint parameter? Is that b - the sharpness? Please be consistent. Regarding the mechanism (see above) - any ideas what leads to this shift in the sensitivity curves between the two types of parameters?

      Yes, we made a mistake. ‘b’ is the midpoint parameter. This was fixed in the text, thank you.

      (13) Figure 6 illustrates why synaptic parameters are more robust, but it is not quantified. Why not provide a quantitative measure for this claim? For example, calculate the colored area within the white square for each pair, for each cell, and for each model. Show that these measures can predict improved robustness for one model over another and for synaptic vs. intrinsic parameters.

      The ratio of areas of the colored and non-colored regions in the whole hyperboxes (for intrinsic and synaptic conductances) is the number reported in the y-axis of the sensitivity curves when we include all conductances (and not just a pair). 

      We computed the ratios of the colored/noncolored areas in all panels in figure 6 and now report these quantities as follows, 

      "We computed the proportions of areas of the white boxes that correspond to pyloric activity. These values for the intrinsic conductances panels are PD = 0.58, LP = 0.50, PY = 0.49, and the proportions for the synaptic conductances panels are PDPY = 0.62, P DLP = 0.87, and LPPD = 0.94. The occupied areas for synaptic conductances are larger than in the intrinsic conductances panels, consistent with our finding that the circuits’ activities are more robust to changes in synaptic conductances versus changes in intrinsic conductances."

      "As before, we computed the proportion of areas of pyloric activity within the white boxes: PD = 0.61, LP = 0.55, PY = 0.52, and the proportions for the synaptic conductances panels are PDPY = 0.88, PDLP = 0.87, and LPP D = 0.83. These results provide an intuition of the complexities of GP . Not only are these regions hard-to-impossible to characterize in one circuit, but they are also different across circuits.” 

      (14) Does the sign of the synaptic weights affect the conclusions?

      We did not explore this issue because all chemical synapses in this network are inhibitory.

      (15) Line 492: typo: deltai.

      We fixed this.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 301 - you can also add Williams and Fletcher 2019 Neuron.

      We added the reference. Thank you. 

      (2) Line 316 - this is a strange comment as these exact regions that were shown intrinsic plasticity (e.g., Losonczy, Attila, Judit K. Makara, and Jeffrey C. Magee. "Compartmentalized dendritic plasticity and input feature storage in neurons." Nature 452.7186 (2008): 436-441).

      We did not understand this comment. 

      (3) I found only one citation for the work of Turrigiano, the most relevant of which is only mentioned in the Method section. This is odd, as her work directly relates how synaptic conductance perturbation results in changes in intrinsic conductance.

      We included more references to the work of Turrigiano to provide more context. 

      "Desai, Niraj S., Lana C. Rutherford, and Gina G. Turrigiano. "Plasticity in the intrinsic excitability of cortical pyramidal neurons." Nature neuroscience 2, no. 6 (1999): 515-520.” "Desai, Niraj S., Sacha B. Nelson, and Gina G. Turrigiano. "Activity-dependent regulation of excitability in rat visual cortical neurons." Neurocomputing 26 (1999): 101-106.”

      (4) Line 329 - The list of citations is very limited regarding studies of ext/int balance which started really way before 2009. Please give some of the credit to the classics.

      We included the following additional references.

      Van Vreeswijk, Carl, and Haim Sompolinsky. "Chaos in neuronal networks with balanced excitatory and inhibitory activity." Science 274, no. 5293 (1996): 1724-1726.

      Rubin, Ran, L. F. Abbott, and Haim Sompolinsky. "Balanced excitation and inhibition are required for high-capacity, noise-robust neuronal selectivity." Proceedings of the National Academy of Sciences 114, no. 44 (2017): E9366-E9375.

      Wang, Xiao-Jing. "Macroscopic gradients of synaptic excitation and inhibition in the neocortex." Nature reviews neuroscience 21, no. 3 (2020): 169-178.

      Lo, Chung-Chuan, Cheng-Te Wang, and Xiao-Jing Wang. "Speed-accuracy tradeoff by a control signal with balanced excitation and inhibition." Journal of Neurophysiology 114, no. 1 (2015): 650-661.

      (5) In Figure 1B, why does it say 'OFF' when the neuron is spiking?

      The label indicates the interval of time elapsed between the first spike in the PD neuron (taken as a reference), and the last spike in the burst (PD off). 

      Summary of changes to figures:

      Figure 1:

      Fixed labels indicating bursting period and burst duration.

      Figure 5:

      Added labels in panels C and D specifying the symbol corresponding to the sigmoidal parameter.

      Additional changes

      We changed the title of the manuscript as follows:

      "Rhythmic circuit function is more robust to changes in  synaptic than intrinsic conductances." We included the following sentence at the end of the Discussion Section. 

      "We believe our results will hold for other rhythmic circuits and will be relevant for similar studies in other circuits with more complex functions.”

      We realized we made a mistake with the units for maximal conductances. They were incorrectly expressed in nS (nano Siemens) in the figure labels, and correctly expressed in micro Siemens in the methods section. This was fixed and now conductances are expressed in micro Siemens consistently in the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reply to the comments of the second referee

      We sincerely appreciate the positive evaluation and the useful suggestions on our manuscript.

      (1) The authors identified key metabolites affecting responses to perturbations in two ways: (i) by fixing a metabolite's value and (ii) by performing a sensitivity analysis. It would be helpful for the modeling community to understand better the differences and similarities in the obtained results. Do both methods identify substrate-level regulators? Is freezing a metabolite's dynamics dramatically changing the metabolic response (and if yes, which ones are so different in the two cases)? Does the scope of the network affect these differences and similarities? 

      Thank you for these suggestions. We compared the Sobolʼ total sensitivity index with the absolute values of the change in the response coefficient (Figure S6 in the revised manuscript). There is no clear relationship between the two quantities. The Sobolʼ sensitivity analysis quantifies how a perturbation on the concentration of a metabolite X contributes to the overall dynamics. On the other hand, the analysis in which metabolitesʼ concentrations are fixed measures how strongly metabolite X helps propagate the perturbations on the other metabolites throughout the metabolic network. In other words, in the Sobolʼ analysis, we evaluate the outcome when the perturbation is applied directly to metabolite X, whereas in the fixing-metabolites analysis, we consider perturbations applied to other metabolites and assess how X influences those perturbations. We believe this conceptual difference explains why the two quantities do not correlate. We suspect that this lack of correlation is independent of the networkʼs scope, because each method evaluates a different aspect of the system.  We would say that both methods identify the effect of the metabolite dynamics on the overall dynamics whatever the form is, i.e. the methods do not distinguish the perturbation on the metabolite affecting the overall dynamics by whether the stoichiometric (reactant) way or, the substrate-level regulations. Thus, identifying the substrate-level regulation by utilizing the methods would be challenging. 

      (2) Regarding the issues the authors encountered when performing the sensitivity analysis, they can be approached in two ways. First, the authors can check the methods for computing conserved moieties nicely explained by Sauro's group (doi:10.1093/bioinformatics/bti800) and compute them for large-scale networks (but beware of metabolites that belong to several conserved pools). Otherwise, the conserved pools of metabolites can be considered as variables in the sensitivity analysis-grouping multiple parameters is a common approach in sensitivity analysis. 

      Thank you for this helpful suggestion. Following the method described in the reference, we have computed the Sobolʼ sensitivity index of NADH, NADPH, and Q8H2 (with their counterparts algebraically solved and treated as dependent variables). We have updated Figure S5 accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary:

      The authors examine the role of the medial prefrontal cortex (mPFC) in cognitive control, i.e. the ability to use task-relevant information and ignore irrelevant information, in the rat. According to the central-computation hypothesis, cognitive control in the brain is centralized in the mPFC and according to the local hypothesis, cognitive control is performed in task-related local neural circuits. Using the place avoidance task which involves cognitive control, it is predicted that if mPFC lesions affect learning, this would support the central computation hypothesis whereas no effect of lesions would rather support the local hypothesis. The authors thus examine the effect of mPFC lesions in learning and retention of the place avoidance task. They also look at functional interconnectivity within a large network of areas that could be activated during the task by using cytochrome oxidase, a metabolic marker. In addition, electrophysiological unit recordings of CA1 hippocampal cells are made in a subset of (lesioned or intact) animals to evaluate overdispersion, a firing property that reflects cognitive control in the hippocampus. The results indicate that mPFC lesions do not impair place avoidance learning and retention (though flexibility is altered during conflict training), do not affect cognitive control seen in hippocampal place cell activity (alternation of frame-specific firing), a measure of location-specific firing variability, in pretraining. It nevertheless has some effect on functional interconnections. The results overall support the local hypothesis. 

      Strengths:

      Straightforward hypothesis: clarification of the involvement of the mPFC in the brain is expected and achieved. Appropriate use of fully mastered methods (behavioral task, electrophysiological recordings, measure of metabolic marker cytochrome oxidase) and rigorous analysis of the data. The conclusion is strongly supported by the data. 

      Weaknesses:

      No notable weaknesses in the conception, making of the study, and data analysis. The introduction does not mention important aspects of the work, i.e. cytochrome oxidase measure and electrophysiological recordings. The study is actually richer than expected from the introduction. 

      The revised Introduction now includes:

      “We used cytochrome oxidase, a metabolic marker of baseline neuronal activity, to confirm the mPFC lesions were effective and that there are non-local network consequences despite the local lesion. We first evaluated cytochrome oxidase activity in regions known to be associated with performance in the active place avoidance task, or regions with known connectivity to the mPFC. We then evaluated covariance of activity amongst the regions in an effort to detect network consequences of the lesion.”

      Reviewer #2 (Public review): 

      Park et al. set out to test two competing hypotheses about the role of the medial prefrontal cortex (PFC) in cognitive control, the ability to use task-relevant cues and ignore taskirrelevant cues to guide behavior. The "central computation" hypothesis assumes that cognitive control relies on computations performed by the PFC, which then interacts with other brain regions to accomplish the task. Alternatively, the "local computation" hypothesis suggests that computations necessary for cognitive control are carried out by other brain regions that have been shown to be essential for cognitive control tasks, such as the dorsal hippocampus and the thalamus. If the central computation hypothesis is correct, PFC lesions should disrupt cognitive control. Alternatively, if the local computation hypothesis is correct, cognitive control would be spared after PFC lesions. The task used to assess cognitive control is the active place avoidance task in which rats must avoid a section of a rotating arena using the stationary room cues and ignoring the local olfactory cues on the rotating platform. Performance on this task has previously been shown to be disrupted by hippocampal lesions and hippocampal ensembles dynamically represent the room and arena depending on the animal's proximity to the shock zone. They found no group (lesion vs. sham) differences in the three behavioral parameters tested: distance traveled, latency to enter the shock zone, and number of shock zone entries for both the standard task and the "conflict" task in which the shock zone was rotated by 180 degrees. The only significant difference was the savings index; the lesion group entered the new shock zone more often than the sham group during the first 5 minutes of the second conflict session. This deficit was interpreted as a cognitive flexibility deficit rather than a cognitive control failure. Next, the authors compared cytochrome oxidase activity between sham and lesion groups in 14 brain regions and found that only the amygdala showed significant elevation in the lesion vs. sham group. Pairwise correlation analysis revealed a striking difference between groups, with many correlations between regions lost in the lesion group (between reuniens and hippocampus, reuniens and amygdala and a correlation between dorsal CA1 and central amygdala that appeared in the lesion group and were absent in the sham group. Finally, the authors assessed dorsal hippocampal representations of the spatial frame (arena vs. room) and found no differences between lesion and sham groups. The only difference in hippocampal activity was reduced overdispersion in the lesion group compared to the sham group on the pretraining session only and this difference disappeared after the task began. Collectively, the authors interpret their findings as supporting the local computation hypothesis; computations necessary for cognitive control occur in brain regions other than the PFC. 

      Strengths:

      (1) The data were collected in a rigorous way with experimental blinding and appropriate statistical analyses. 

      (2) Multiple approaches were used to assess differences between lesion and sham groups, including behavior, metabolic activity in multiple brain regions, and hippocampal singleunit recording. 

      Weaknesses:

      (1) Only male rats were used with no justification provided for excluding females from the sample.

      This is a weakness we acknowledge. The experiments were performed at a time when we did not have female rats in the lab.

      (2) The conceptual framework used to interpret the findings was to present two competing hypotheses with mutually exclusive predictions about the impact of PFC lesions on cognitive control. The authors then use mainly null findings as evidence in support of the local computation hypothesis. They acknowledge that some people may question the notion that the active place avoidance task indeed requires cognitive control, but then call the argument "circular" because PFC has to be involved in cognitive control. This assertion does not address the possibility that the active place avoidance task simply does not require cognitive control. 

      We beg to differ that the possibility was not addressed. Prior to making the assertion, the manuscript describes the evidence that the active place avoidance task requires cognitive control. The evidence is multifold, and includes task design, behavior, and electrophysiology; we argue that this is more evidence than has been provided for other tasks that are asserted to require cognitive control. Specifically line 417 states:

      “We have previously demonstrated cognitive control in the active place avoidance task variant we used (Fig. 1) because the rats must ignore local rotating place cues to avoid the stationary shock zone. Even when the arena does not rotate, rats distinctly learn to avoid the location of shock according to distal visual room cues and local olfactory arena cues, such that the distinct place memories can be independently manipulated using probe trials [49, 50]. When the arena rotates as in the present studies, neural manipulations that impair the place avoidance are no longer impairing when the irrelevant arena cues are hidden by shallow water [14, 15, 51, 52]. Furthermore, persistent hippocampal neural circuit changes caused by active place avoidance training are not detected when shallow water hides the irrelevant arena cues to reduce the cognitive control demand [10, 31, 33]. While these findings unequivocally demonstrate the salience of relevant stationary room cues to use for avoiding shock and irrelevant arena cues to ignore during active place avoidance, the most compelling evidence of cognitive control comes from recording hippocampal ensemble discharge. Hippocampal ensemble discharge purposefully represents current position using stationary room information when the subject is close to the stationary shock zone and alternatively represents rotating arena information when the mouse is far from the stationary shock zone [Fig. 4; 10].”

      Line 436, however, acknowledges a fact that will always be true: no matter what anyone opines - until there are universally agreed upon objective criteria, it is logically possible that active place avoidance does not require cognitive control. The revision states: Despite this evidence from task design, behavioral observations, and direct electrophysiological representational switching as required to directly demonstrate cognitive control, one might still argue that it is logically possible that the active place avoidance task does not require cognitive control and this is why the mPFC lesion did not impair place avoidance of the initial shock zone. We consider such reasoning to be unproductive because it presumes that only tasks that require an intact mPFC can be cognitive control tasks. We nonetheless acknowledge that for some, we have not provided sufficient evidence that the active place avoidance requires cognitive control.

      “We assert the evidence is compelling, and together these findings require rejecting the central-computation hypothesis that the mPFC is essential for the neural computations that are necessary for all cognitive control tasks.”

      (3) The authors did not link the CO activity with the behavioral parameters even though the CO imaging was done on a subset of the animals that ran the behavioral task nor did they make any attempt to interpret these findings in light of the two competing hypotheses posed in the introduction. Moreover, the discussion lacks any mechanistic interpretations of the findings. For example, there are no attempts to explain why amygdala activity and its correlation with dCA1 activity might be higher in the PFC lesioned group. 

      The CO study was performed to assess the effects of the lesion, as stated on line 262 “Cytochrome oxidase (CO), a sensitive metabolic marker for neuronal function [27], was used to evaluate whether lesion effects were restricted to the mPFC.” Furthermore, as a matter of fact, line 411 states “Thus, CO imaging and electrophysiological evidence identify changes in the brain beyond the directly damaged mPFC area. In particular, the dorsal hippocampus loses the inhibitory input from mPFC [45, 46] and loses the metabolic correlation with the nucleus reuniens, which is thought to be a relay between the mPFC and the dorsal hippocampus [47, 48].”

      These CO measures assess baseline metabolic function and so it would be inappropriate to correlate them with the measures of behavior. Because the lesion and control groups do not differ on most measures of behavior, a relationship to CO measures is not expected. Importantly, even if there were differences in correlations between CO activity and behavioral measures, what could they mean? The study was designed to distinguish between two hypotheses, not to determine what CO differences could mean for behavior. As such, it is not at all clear how metabolic consequences of the lesion relate to the two hypotheses being evaluated, and so we consider it inappropriate to speculate. We did examine, and now include, the correlation between lesion size and conflict behavior. The Fig. 1 legend states “Savings was not related to lesion size r = 0.009, p = 0.98. *p < 0.05.”

      (4) Publishing null results is important to avoid wasting animals, time, and money. This study's results will have a significant impact on how the field views the role of the PFC in cognitive control. Whether or not some people reject the notion that the active place avoidance task measures cognitive control, the findings are solid and can serve as a starting point for generating hypotheses about how brain networks change when deprived of PFC input. 

      We thank the reviewer for the acknowledgement.

      Reviewer #3 (Public review): 

      Summary:

      This study by Park and colleagues investigated how the medial prefrontal cortex (mPFC) influences behavior and hippocampal place cell activity during a two-frame active place avoidance task in rats. Rats learned to avoid the location of mild shock within a rotating arena, with the shock zone being defined relative to distal cues in the room. Permanent chemical lesions of the mPFC did not impair the ability to avoid the shock zone by using distal cues and ignoring proximal cues in the arena. In parallel, hippocampal place cells alternated between two spatial tuning patterns, one anchored to the distal cues and the other to the proximal cues, and this alteration was not affected by the mPFC lesion. Based on these findings, the authors argue that the mPFC is not essential for differentiating between task-relevant and irrelevant information. 

      Strengths:

      This study was built on substantial work by the Fenton lab that validated their two-frame active place avoidance task and provided sound theoretical and analytical foundations. Additionally, the effectiveness of mPFC lesions was validated by several measures, enabling the authors to base their argument on the lack of lesion effects on behavior and place cell dynamics. 

      Weaknesses:

      The authors define cognitive control as "the ability to judiciously use task-relevant information while ignoring salient concurrent information that is currently irrelevant for the task." (Lines 77-78). This definition is much simpler than the one by Miller and Cohen: "the ability to orchestrate thought and action in accordance with internal goals (Ref. 1)" and by Robbins: "processes necessary for optimal scheduling of complex sequence of behaviour." (Dalley et al., 2004, PMID: 15555683). Differentiating between task-relevant and irrelevant information is required in various behavioral tasks, such as differential learning, reversal learning, and set-shifting tasks. Previous rodent behavioral studies have shown that the integrity of the mPFC is necessary for set-shifting but not for differential or reversal learning (e.g., Enomoto et al., 2011, PMID: 21146155; Cho et al., 2015, PMID: 25754826). In the present task design, the initial training is a form of differential learning between proximal and distal cues, and the conflict training is akin to reversal learning. Therefore, the lack of lesion effects is somewhat expected. It would be interesting to test whether mPFC lesions impair set-shifting in their paradigm (e.g., the shock zone initially defined by distal cues and later by proximal cues). If the mPFC lesions do not impair this ability and associated hippocampal place dynamics, it will provide strong support for the authors' local computation hypothesis.

      Thank you for these comments. In addressing them we have provided a significant revision to the manuscript’s Introduction. While authors like those cited by the reviewer have defined cognitive control, those definitions are difficult to test rigorously, as it is almost a matter of opinion whether a subject is displaying “the ability to orchestrate thought and action in accordance with internal goals" or whether they are using "processes necessary for optimal scheduling of complex sequence of behaviour." What would such definitions of cognitive control predict about neuronal activity? We have deliberately used a simple, operational definition of cognitive control because it is physiologically testable. In the revision, starting at line 93, we have provided an excerpt from Miller and Cohen (2001) with discussion. The importance of that work is that it provides explicit neuronal criteria and a means to operationally define cognitive control. As stated on Line 118 “Accordingly, cognitive control would be at work when there is sustained neuronal network representations of task-relevant information that suppresses or gates representations of salient task-irrelevant information in accord with purposeful judicious behavior.”

      We used a R+A- task variant in which there is a stationary room-frame shock zone and task irrelevant arena-frame information. A strict correspondence to shift-shifting task design cannot be accomplished with active place avoidance because an A+R- task that requires avoiding an arena-frame shock zone in the absence of a room-frame shock zone can be accomplished trivially if the subject chooses to not move when it is in a place with no shock. However, the R+A+ task variant is readily learned, in which there is both a room-frame and an arena-frame shock zone (see cited work below). This task variant requires the subject to judiciously shift between avoiding the room-frame shock zone using stationary room information and avoiding the arena-frame shock zone using rotating arena information. This R+A+ task variant might meet the reviewer’s criteria for cognitive control. We have recorded hippocampal and entorhinal ensemble activity during the R+A+ task variant and it is very similar to the activity during the R+A- task we used. Nonetheless, future work will investigate the efect of mPFC lesion on the R+A+ task variant.

      Cited work:

      Fenton AA, Wesierska M, Kaminsky Y, Bures J (1998), Both here and there: simultaneous expression of autonomous spatial memories in rats. Proc Natl Acad Sci U S A 95:11493-11498. Kelemen E, Fenton AA (2010), Dynamic grouping of hippocampal neural activity during cognitive control of two spatial frames. PLoS Biol 8:e1000403.

      Burghardt NS, Park EH, Hen R, Fenton AA (2012), Adult-born hippocampal neurons promote cognitive flexibility in mice. Hippocampus 22:1795-1808.

      Park EH, Keeley S, Savin C, Ranck JB, Jr., Fenton AA (2019), How the Internally Organized Direction Sense Is Used to Navigate. Neuron 101:1-9.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      (1) Incorporate the cytochrome oxidase and hippocampal recordings (rationale and hypothesis) in the introduction, explaining how these aspects are relevant to the general question. 

      We have done this as requested. See lines 159-173 of the revised introduction.

      (2) Figure 1C. On Day 4-5 (conflict training) in which the shock zone was relocated 180 deg from the initial location, the behavioral tracks did not show any presence of the rat in this sector (in particular for the lesion example). Figure 4 nevertheless indicates that entrances have been made (which was expected since rats have to know that the shock zone was relocated).

      Thanks for pointing this out. The tracks are from the end of the sessions. The labels have been changed to specify which trials the tracks are from.

      (3) Figure 1C. The caption is huge as it contains the statistical analyses details. I would prefer to have these details in the text and keep the caption at a "reasonable" length. At the end of the caption (l. 190-191), it would be less confusing the keep the numbering of the training days: replace D1T1 with D2T1 and D2T9 with D3T9).

      The statistical details have been relocated to the main text and the numbering updated, as suggested, thank you.

      (4) It was not inconsiderable to show that mPFC lesion had some effects in the present task if it were only to validate the effectiveness of the lesion. This brain area has been shown to be important for planning, cognitive flexibility, etc. Indeed the authors found that the saving index was greater in sham than in mPFC rats (overdispersion in hippocampal firing was also reduced in pretraining) and interpreted this result as impaired flexibility. Would an alternative explanation be a memory deficit? I nevertheless expected that impaired flexibility in mPFC rats would be expressed in conflict trials in the form of more entrances in the zone that was initially not associated with shock (at least in the first trials of Day 4). But it appears to not be the case.

      A memory deficit is unlikely to explain the difference between the groups on the first trial of Day 5. Memory in the lesion rats was tested multiple times, specifically at the start of each trial (time to first entrance), including on the 24-h retention test, and no deficits were observed. Performance on Day 9 trial 1 is worse in the lesion group than in the controls, but it is not parsimonious to attribute this to a simple memory deficit since 24-h memory was good and similar between lesion and control rats on days 3 and 4, and memory on Day 5 was equally poor in both the lesion and control rats, as measured by time to first entrance.  

      (5) Material and methods. The injected volume of ibotenic acid should be mentioned. 

      The volume 0.2 µl was added. See line 531.

      (6) The rationale for doing the conflict training session should be indicated somewhere. 

      The rationale was provided. See lines 204-208.

      Reviewer #2 (Recommendations for the authors): 

      (1) Line 132: The text states that all sham rats improved and only 6/10 lesion rats improved is followed by a t-test, which tests the difference between means; it does not compare proportions. Also, what criterion was used to determine if an improvement was seen or not? 

      The statistical comparison is provided (now lines 230: test of proportions z = 2.3, p = 0.03). Improvement was simply numerically fewer entrances.

      (2) Line 138: This is a very long and confusing sentence. Consider revising for clarity. 

      The sentence (now line 234) was revised.

      (3) Figure 1B only includes data from 3 animals. Most published studies show the whole dataset by presenting the largest and smallest lesions. 

      Supplemental Figure S2 was added with all the lesions depicted and quantified.

      (4) Figure 1C suggestion to make the schematic shock zone line up with the shock zone shown for the tracking data. 

      Graphically, it looks better as drawn as it uses to perspective to depict a three-dimensional structure.

      (5) Methods: Clarify if the shock zone location was the same across all rats. 

      Line 570 states that the shock zone was the same for all rats.

      (6) Line 158: "Behavioral tracks" is not clear. Suggest more precise wording.

      Reworded to “Tracked room-frame positions” (now line 249)

      (7) Line 166: "effect of trial" - should this be the main effect of trial?; "interaction" - should this be "group x trial" interaction? 

      Reworded (now line 181).

      (8) Line 167: "or their interaction" is awkward in the context of the sentence. 

      Reworded (now line 182).

      (9) Line 182: Avoid talking about "trends" as if they are almost significant unless the authors suspect that they did not have sufficient statistical power to detect differences. In that case, a power analysis should be provided. 

      Removed.

      (10) Line 190: "left:...right..." is hard to follow, especially with acronyms like D1T1. Consider revising for clarity. 

      Revised (now lines 246-248).

      (11) Line 195: "effectiveness of the PFC to impair" is unnecessarily verbose. 

      Reworded (now lines 255-257).

      (12) Savings results: There is a lot of variability in the lesion group. It would be interesting to know if the extent of the lesion correlates with savings.

      Savings was not related to lesion. See line 259.

      (13) Line 300: The thalamic recording results are not reported in the results section (other than appearing in the table). Moreover, there is no detail about which thalamic nucleus these recordings are from.

      Lines 411 and 614 provides these details.  

      (14) Line 312: "no longer impair" contains a grammatical error. 

      Corrected (now line 422)

      (15) Line 325: "was not impairing" contains a grammatical error. 

      Corrected (now line 437).

      (16) Line 327: The sentence ending with "...opinion of others" seems unnecessarily confrontational. 

      Previous reviewers at other journals have maintained this position, we therefore included such a strong statement in our initial submission. However, we now revised this statement to avoid appearing confrontational.

      (17) Line 329: Sentence is awkward. Consider revising. 

      Revised (now line 443).

      (18) Line 384: The authors should disclose if there was an objective metric for determining the adequacy of the lesion. 

      The lesion assessment and quantification is better explained in the Methods under “Cytochrome oxidase activity and Nissl staining,” (lines 708-714).

      (19) Line 385: The authors should clarify how they got from 15 rats (Line 376) to 10. 

      This information is provided in the methods.

      (20) Line 390: It is not clear why skin irritation in the cage mate would prevent the rat from being tested. 

      This has been explained in the Methods under “Behavioral analysis followed by cytochrome oxidase activity” (lines 515-518).

      (21) Methods section: The authors should describe how the tracking data were acquired. Overhead camera? Tracker based on luminance or body position? What software program was used? What was the sampling rate? 

      This is now better explained in the Methods under “Active place avoidance task) (lines 538551).

      (22) Methods section: Include how fast the arena was rotating and other details about the task such as where rats were placed during the ITI. 

      Better explained in the Methods under “Active place avoidance task”.

      (23) Line 439: The recording system used (hardware & software) should be stated. 

      This is now included in the Methods (line 538).

      (24) Line 435: Though overdispersion calculation is described thoroughly, there is nothing in the paper that tells me what overdispersion means. 

      What the measure means is now described in the Methods under “Electrophysiology data analysis” (lines 646-650).

      (25) Line 561: The test used to assess effect sizes should be stated. 

      Effect sizes corresponding to the statistical tests are provided.

      Reviewer #3 (Recommendations for the authors): 

      (1) At the end of the conflict training, rats with mPFC lesions learned to avoid the new shock zone (Figure 1F, Block 16), but their place cells did not show room-preferring activity near the shock zone (Figure 4B). This observation questions whether spatial frame-specific representation is relevant for active avoidance. Can the authors clarify this point?

      This is a dynamic behavior and the hippocampal dynamics match, changing with a dynamic that is a few seconds, as we have shown in several published papers. The lack of a preference averaged over 20 minutes when the rats are avoiding both the current and former shock zones during the conflict session is pretty much what would be expected from such a coarse measurement. The important measure is the spatially-resolved measure of room versus arena preference. Figure 4B shows that in the lesion rats there is less of a frame preference during conflict, generally (consistent with poorer flexibility). However, Figure 4D quantifies the frame preference near and far from the shock zone and accordingly, there is no difference between the groups.

      (2) Related to the point above, the author might consider including panels in Figures 4C and D to show the neural activity during the pretraining and conflict training retention period. I assume p(room) will be comparable between the Near and Far segment in both sessions, but the p(room) may be higher in the Conflict training session than the Pretraining session. This would show that the mPFC lesion impairs suppressing the place cell activity encoding the old shock location. 

      Thanks for the suggestion. While we don’t think we can draw any strong conclusions from this analysis we are fine to show it. The issue is that during conflict, the rats have two perfectly reasonable representations of where there was shock, the initial location that was turned off to make the conflict, and the most recent conflict location of shock. Importantly, these recordings are during conflict retention after we turned off the shock for the retention recording (for the second time in the rat’s experience). Turning off the shock allows us to exactly match the physical conditions of pretraining, initial retention and conflict retention, which was the experimental design’s goal. However, the experiential history of the rats prior to initial retention and conflict retention cannot match, because during initial retention the rats had never experienced a changed shock zone whereas, by conflict retention, they had experienced multiple changes. Importantly, we have previously shown that mouse hippocampal ensembles represent both initial and conflict shock locations, as the animals consider their options during conflict trials (see Dvorak et al 2018, PLoS Biol 16:e2003354). Consequently, we cannot make any strong predictions about whether or not hippocampal activity during conflict retention should be room-frame preferring selectively in the vicinity of the current shock zone. As I am sure the reviewer appreciates from their own introspection, mental representations are mercifully not obliged to dictate behavior. In fact, that is what is interesting and controversial about cognitive control – it is a dynamic internal process and the innovation of our work lies in demonstrating that one cannot only rely on behavior to assess this process. Nonetheless, we did this analysis and now present it in the revised Fig. 4. During pretraining both lesion and sham groups express no particular spatially-modulated preference for either the room or the arena frame, as expected. During initial training both groups express a room-frame preference in the vicinity of the shock zone, as we initially reported. By inspection, during conflict, the sham rats express a preference for room-frame activity in the vicinity of the most recent shock zone location; this preference is weaker than what is expressed during initial retention. The lesion rats do not show this preference. These impressions are quantified in revised Fig. 4D; the comparisons within the conflict retention sessions did not reach statistical significance. We leave it to the reader to interpret what that means. Thanks for the nudge.

      (3) The significant group difference in place cell overdispersion during the pretraining phase (Figure 3C) is interesting, but some readers would appreciate additional sentences on its functional implication. Does it mean the spatial tuning of place cells was disrupted by the mPFC lesion?

      Only the reliability of spatial firing was altered, not the spatial tuning.

      (4) Although the method section described how to calculate overdispersion and SFEP, some concise, intuitive descriptions of these measures in the result section would help readers understand these results.

      Overdispersion is better explained. See lines 646-650.

      (5) I recommend adding a figure of the task performance of the rats used in the electrophysiological recording experiment and a table summarizing the number of cells recorded per animal. 

      We have included Table S2 with the cell counts and a summary of the performance for each of the rat in the electrophysiological recording experiment.

      (6) Readers would appreciate additional information on task apparatus, such as the size, appearance, and rotating speed of the arena, as well as stationary cues available in the room. 

      This is now provided in the Methods under “Active place avoidance task”.

      (7) Lines 425-416: "On the fourth day of the behavioral training, the rats had a single trial with the shock on to test retention of the training." Shouldn't it be "shock off"? 

      No the shock was on to prevent extinction learning and to increase the challenge for conflict learning.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Major Concerns/Public Review

      Comment 1: There is a mild disconnect between behavioral readout (reflexive pain) and neural circuits of interest (emotional). Considering that this circuit is likely engaged in the aversiveness of pain, it would have been interesting to see how carrageenan and/or AIE impacted non-reflexive pain measures. Perhaps this would reveal a potentiated or dysregulated phenotype that matches the neurophysiological changes reported. However, this critique does not take away from the value of the paper or its conclusions.

      We agree that including measures of non-reflexive pain would enhance future studies and potentially reveal a phenotype that is closely related to the observed changes in neurophysiology.

      Minor Concerns/Recommendations

      Comment 1: There are a few minor grammatical errors in the text, mostly in the captions. A close read should be able to identify these errors.

      We have fixed what grammatical errors we found.

      Reviewer #2:

      Major Concerns/Public Review

      No major concerns.

      Minor Concerns/Recommendations

      Comment 1: If pain sensitivity was assessed at 3 time points post carrageenan administration, why were these data averaged? Were there no differences between the time points? The data from the 3 time points should be presented, either in a figure, table, or supplementary materials.

      We averaged the pain sensitivity data across the 3 time points following carrageenan administration because we were trying to present this data in a more concise manner. Pain sensitivity did change over time following carrageenan administration. We have now included the unaveraged data in figure 2 (panels D, F, H, and J).

      Comment 2: For the optically-evoked EPSCs and IPSCs, were the peak amplitudes the max responses that could be obtained? If not, how were levels of ChR2 expression or light intensity controlled for?

      The peak amplitudes for EPSCs and IPSCs were half the maximal response that could be evoked by optical stimulation. The AMPA and NMDA currents were maximal responses as prior literature indicated some PVINs have small NMDA currents, and we wanted to ensure these currents would be detected reliably. We updated our methods section to include this information in the voltage clamp recordings section.

      Comment 3: In the example traces for the aEPSC experiment, the figure legend states that the "+" symbol indicates an asynchronous event. However, there are several "|" or "-" symbols in the figure. Perhaps this is an issue with the resolution of the figure and those are supposed to be "+"s.

      We have increased the resolution of the figures to ensure that the markings of the asynchronous events display properly. We apologize for not noticing that these symbols were not displayed correctly in the original figures included in the manuscript.

      Comment 4: For the von Frey and the Hargreaves test, were animals acclimated to the apparatus in the days leading up to the first test, or was the 5-minute pre-test the only acclimation that was done? This information needs to be provided. If the latter, there is concern that the animals did not fully acclimate to the apparatus and handling prior to testing, which should be taken into consideration in the interpretation of the behavioral analyses.

      The rats underwent handling once a day for three days prior to the first von Frey and Hargreaves tests. On the day prior to the first test, rats were acclimated to the von Frey and Hargreaves apparatuses. The acclimation period consisted of a 15-min exposure to the von Frey apparatus and a 30-min exposure to the Hargreaves apparatus for each animal. This information has been added to the revised methods section under the assessment of mechanical and thermal sensitivity heading.

      Reviewer #3:

      Major Concerns/Public Review

      Comment 1: There is incomplete evidence supporting some of the conclusions drawn in this manuscript. The authors claim that the changes in feedforward inhibition onto pyramidal cells are due to the changes in parvalbumin interneurons, but evidence is not provided to support that idea. PV cells do not spontaneously fire action potentials spontaneously in slices (nor do they receive high levels of BLA activity while at rest in slices). It is possible that spontaneous GABA release from PV cells is increased after AIE but the authors did not report sIPSC frequency. Second, the authors did not determine that PV cells mediate the feedforward BLA op-IPSCs and changes following AIE (this would require manipulation to reduce/block PV-IN activity). This limitation in results and interpretation is important because prior work shows BLA-PFC feedforward IPSCs can be driven by somatostatin cells. Cholecystokinin cells are also abundant basket cells in PFC and have been recently shown to mediate feedforward inhibition from the thalamus and ventral hippocampus, so it's also possible that CCK cells are involved in the effects observed here.

      The hypothesis that adolescent alcohol exposure could change spontaneous GABA release from PVINs is an interesting one that merits future exploration. Unfortunately, as the focus of this manuscript was on circuit-specific alterations in synaptic function, this experiment is somewhat outside the scope of the paper as sIPSCs and mIPSCs are not circuit specific measures of GABA activity and would not reflect spontaneous release from only GABA interneurons receiving input from the BLA. Despite this, a future study investigating spontaneous GABA release from PVINs in the PrL would be a valuable complement to the present study.

      While we did not directly manipulate PVINs to demonstrate that decreased oIPSC amplitude at PrL<sup>PAG</sup> neurons following AIE is due solely to changes in PVINs, it is notable that both the intrinsic excitability of PVINs and the BLA-driven E/I balance at PVINs were reduced following AIE. These changes would be consistent with decreased PVIN output onto PrL<sup>PAG</sup> neurons. However, we agree that this does not preclude the possibility that changes in SST or CCK interneurons contribute to the observed decrease in BLA-driven inhibition at PrL<sup>PAG</sup> neurons following AIE. As such, we have altered the wording in the discussion to indicate that reduced BLA-driven feedforward inhibition of PrL<sup>PAG</sup> neurons may be related, at least in part, to the observed changes in PVINs.

      Comment 2: The authors conclude that the changes in this circuit likely mediate long-lasting hyperalgesia, but this is not addressed experimentally. In some ways, the focused nature of the study is a benefit in this regard, as there is extensive prior literature linking this circuit with pain behaviors in alternative models (e.g., SNI), but it should be noted that these studies have not assessed hyperalgesia stemming from prior alcohol exposure. While the current studies do not include a causative behavioral manipulation, the strength of the association between BLA-PL-PAG function and hyperalgesia could be bolstered by current data if there were relationships detected between electrophysiological properties and hyperalgesia. Have the authors assessed this? In addition, this study is limited by not addressing the specificity of synaptic adaptations to the BLA-PL-PAG circuit. For instance, PL neurons send reciprocal projections to BLA and send direct projections to the locus coeruleus (which the authors note is an important downstream node of the PAG for regulating pain).

      We have not assessed correlations between the electrophysiological properties and hyperalgesia. We feel that future studies using DREADDs to perform cell-type and circuit-specific manipulations can better address the involvement of this circuitry in long-lasting hyperalgesia following AIE. With respect to the circuit specificity of the observed changes, we have previously evaluated the effects of AIE on pyramidal neurons projecting from the PrL to the BLA (PrL<sup>BLA</sup>). We found that following AIE exposure there was no change in the intrinsic excitability of these neurons. In addition, the amplitude and frequency of sEPSCs and sIPSCs onto PrL<sup>BLA</sup> neurons was unchanged. While these results did not assess whether the BLA-PrL-BLA circuit undergoes synaptic adaptations similar to those observed in the BLA-PrL-vlPAG circuit, it is notable that the intrinsic excitability of PrL<sup>BLA</sup> neurons was unchanged following AIE exposure. This indicates that the effects of AIE on the intrinsic excitability of pyramidal neurons in the PrL may be circuit specific. We agree that it would be interesting to study the effect of AIE on PrL neurons that project to the locus coeruleus, however due to the well-defined role of the BLA-PrL-vlPAG circuit in pain we chose to evaluate this circuit first.

      Comment 3: I have some concerns about methodology. First, 5-ms is a long light pulse for optogenetics and might induce action-potential independent release. Does TTX alone block op-EPSCs under these conditions? Second, PV cells express a high degree of calcium-permeable AMPA receptors, which display inward rectification at positive holding potentials due to blockade from intracellular polyamines. Typically, this is controlled/promoted by including spermine in the internal solution, but I do not believe the authors did that. Nonetheless, the relatively low A/N ratios for this cell type suggest that CP-AMPA receptors were not sampled with the +40/+40 design of this experiment, raising concerns that the majority of AMPA receptors in these cells were not sampled during this experiment. Finally, it should be noted that asEPSC frequency can also reflect changes in a number of functional/detectable synapses. This measurement is also fairly susceptible to differences in inter-animal differences in ChR2 expression. There are other techniques for assessing presynaptic release probability (e.g., PPR, MK-801 sensitivity) that would improve the interpretation of these studies if that is intended to be a point of emphasis.

      When we included TTX but not 4-AP we did not observe any optically evoked responses, so we don’t believe that the 5-ms pulse induced action-potential independent release in these experiments. With respect to the second point, we did not include spermine in the internal solution for the AMPA/NMDA recordings in PVINs, and it is possible that endogenous polyamines interfered with recording CP-AMPA receptors in the +40/+40 design. To address this concern, we recalculated the AMPA/NMDA ratio for PVINs using data from an optically evoked AMPA current that was collected while holding the cell at -70 mV. This data was collected at the end of the +40/+40 recording protocol as we were interested in assessing whether there would be any difference in the ratio of the +40/-70 AMPA current across treatment conditions. As there were no observed difference in the +40/-70 AMPA current ratio across treatment groups, we had originally used the +40 AMPA current for calculating the AMPA/NMDA ratio for PVINs to make the methods for calculating this ratio uniform for both PVINs and PrL<sup>PAG</sup> neurons. The methods, results, and Fig. 10 have been updated to reflect the recalculated AMPA/NMDA ratio for PVINs. Notably, only the significance of the AIE x carrageenan interaction was altered by the change in the way the AMPA/NMDA ratio was calculated. Originally, this interaction displayed a trend toward significance (p = 0.0501), however when the recalculated AMPA/NMDA ratio was analyzed this interaction term became significant (p = 0.0131). We have also added the +40/-70 AMPA ratio to figure 10 as it might be of interest.

      Finally, the point regarding aEPSC frequency reflecting not only release probability but also the number of functional/detectable synapses is an important consideration. For this manuscript, we intentionally selected aEPSC frequency for this reason. As the BLA to PrL projection continues to mature during adolescence, the number of BLA contacts onto GABA neurons in the PrL increases. Thus, we thought that it was possible that AIE would alter the number of detectable BLA inputs onto PVINs. We acknowledge that as this measure is sensitive to differences in ChR2 expression between animals/slices it can be difficult to interpret. We also agree that in the future it would be beneficial to include either PPR or MK-801 sensitivity to improve interpretability.

      Comment 4: In a few places in the manuscript, results following voluntary drinking experiments (especially Salling et al. and Sicher et al.) are discussed without clear distinction from prior work in vapor models of dependence.

      We have altered the manuscript to specifically note where voluntary drinking was used rather than vapor models.

      Comment 5: Discussion (lines 416-420). The authors describe some differing results with the literature and mention that the maximum current injection might be a factor. To me, this does not seem like the most important factor and potentially undercuts the relevance of the findings. Are the cells undergoing a depolarization block? Did the authors observe any changes in the rheobase or AP threshold? On the other hand, a more likely difference between this and previous work is that the proportion of PAG-projecting cells is relatively low, so previous work in L5 likely sampled many types of pyramidal cells that project to other areas. This is a key example where additional studies by the current group assessing a distinct or parallel set of pyramidal cells would aid in the interpretation of these results and help to place them within the existing literature. Along these lines, PAG-projecting neurons are Type A cells with significant hyperpolarization sag. Previous studies showed that adolescent binge drinking stunts the development of HCN channel function and ensuing hyperpolarization sag. Have the authors observed this in PAG-projecting cells? Another interesting membrane property worth exploring with the existing data set is the afterhyperpolarization / SK channel function.

      In discussing the maximum current injection as a factor in differing results on intrinsic excitability, we were principally considering how the additional data points increase the power of the analysis and thus the likelihood of detecting an effect. In focusing on this, however, we ignored other relevant and interesting factors that we should also have discussed. Additional analyses examining HCN and SK channel function have now been added to the manuscript and incorporated into the results section under the heading Adolescent Intermittent Ethanol Exposure and Carrageenan Enhanced the Intrinsic Excitability of Prelimbic Neurons Projecting to the Ventrolateral Periaqueductal Gray. We have also modified the third paragraph in the discussion to add additional context. Additional information on the biophysical properties of the neurons has been added to Figure 4.

      Minor Concerns/Recommendations

      Comment 1: Subheadings are vague. "Analysis of..." Should be rephrased to use active voice to describe key findings.

      The subheadings have been rephrased to describe key findings.

      Comment 2: Consider altering or consolidating the figure layout for clarity. For instance, it would be helpful for aEPSCs to be near the AMPA and NMDA experiments. The feedforward IPSCs could also be with the PV-IN recordings. This would be helpful in developing a cohesive picture of key findings. To that end, a working model or graphical abstract would be helpful.

      It doesn’t appear that this journal allows graphical abstracts, but we have added a model that summarizes the principal findings in the discussion.

      Comment 3: There are a lot of statistics punctuating the text in the Results. It can be hard to parse at times.

      We considered moving the statistics to tables, but this became unwieldy.

      Comment 4: The Discussion is quite long (10 paragraphs). Suggest consolidating to 3-4 most salient points.

      We appreciate this comment and have made some edits to the discussion, albeit without consolidating it to only 3-4 points.

    1. Author response:

      eLife Assessment

      The authors provide a useful summary of ten years of Brain Initiative funding including the historical development, the specific funding mechanisms, and examples of grants funded and work produced. The authors also conduct analyses of the impact on overall funding in Systems and Computational Neuroscience, the raw and field normalized bibliographic impact of the work, the social media impact of the funded work, and the popularity of some tools developed. The evidence for impact is incomplete due to the omission of a comparison group of funded grants.

      In this combined version, we include a comparison group of non-BRAIN Initiative R01s derived from the parent notice of funding opportunity from FY2014-2022. We performed a bibliometric analysis of the publications, citations, RCR and budget productivity measure of the non-BRAIN parent R01. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a convincing description of approximately ten years of funding from the NIH BRAIN initiative. It is of particular value at this moment in history, given the cataclysmic changes in the US government structure and function occurring in early 2025.

      Strengths:

      The paper contains a fair bit of documentation so that the curious reader can actually parse what this BRAIN program funded.

      Weaknesses:

      There are too many acronyms, and the manuscript reads as if it were an internal NIH document, where the audience knows all of the NIH nomenclature and program details. It is not particularly friendly to the outside, lay reader.

      In this version, we have attempted to minimize acronyms and explain NIH nomenclature and program details to make it more accessible to readers not familiar with NIH terminology.

      Reviewer #2 (Public review):

      Summary:

      The authors provide an important summary of ten years of Brain Initiative funding including a description of the historical development of the initiative, the specific funding mechanisms utilized, and examples of grants funded and work produced. The authors also conduct analyses of the impact on overall funding in Systems and Computational Neuroscience, the raw and field normalized bibliographic impact of the work, the social media impact of the funded work, and the popularity of some tools developed.

      Strengths:

      This is a useful perspective on an important funding initiative over a ten-year period. It is clearly written and the illustrations and analyses are mostly useful for understanding the impact of the initiative.

      Weaknesses:

      The major limitation is that the bibliographic analysis does not provide a comparison group of funded grants. Because work that successfully competes for funding is likely to be more impactful than all work in a given area, the normalization of citations to field medians may reflect this "grant review" effect, rather than anything special about the Brain Initiative. Hopefully, this speculation is incorrect (I would guess that it is), but it would be helpful to try to demonstrate this more directly by including a funded comparison group.

      In this version, we have provided a comparison group of parent R01s that are not funded through the BRAIN Initiative from FY2014-2022 in Figure 3. We include publication metrics and budget efficiency measures for this comparison group.  

      There are also minor inconsistencies in the numbering of the figures that need to be cleared up.

      We have updated the figure numbers.

    1. Author response:

      eLife Assessment 

      The manuscript presents some useful accounts of experiences funding team projects within the BRAIN Initiative. These would be more appropriate to add to the companion manuscript since the present manuscript contains some overlapping analyses and does not stand well on its own. Therefore the evidence supporting the conclusions is incomplete. 

      We appreciate the feedback on merging both manuscripts into one and have followed the advice in this version. 

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      In this useful narrative, the authors attempt to capture their experience of the success of team projects for the scientific community.  

      Strengths: 

      The authors are able to draw on a wealth of real-life experience reviewing, funding, and administering large team projects, and assessing how well they achieve their goals. 

      Weaknesses: 

      The utility of the RCR as a measure is questionable. I am not sure if this really makes the case for the success of these projects. The conclusions do not depend on Figure 1. 

      We respectfully disagree about the utility of the RCR, particularly because it is metric that is normalized by both year and topical area. We have added a more detailed description of how the RCR is calculated on page 6-7. Please note that figure 1 is aimed to highlight the funding opportunities, investments and number of awards associated with small lab (exploratory) versus team (elaborated, mature) research rather than a description of publication metrics.  

      Reviewer #2 (Public review): 

      Summary: 

      The authors review the history of the team projects within the Brain initiative and analyze their success in progression to additional rounds of funding and their bibliographic impact. 

      Strengths: 

      The history of the team projects and the fact that many had renewed funding and produced impactful papers is well documented. 

      Weaknesses: 

      The core bibliographic and funding impact results have largely been reported in the companion manuscript and so represent "double dipping" I presume the slight disagreement in the number of grants (by one) represents a single grant that was not deemed to address systems/computational neuroscience. The single figure is relatively uninformative. The domains of study are sufficiently large and overlapping that there seems to be little information gained from the graphic and the Sankey plot could be simply summarized by rates of competing success. 

      While we sincerely appreciate the feedback, we chose to retain these plots on domains and models to provide a sense of the broad spectrum of research topics contained in our TeamBCP awards. Further details on the awards can be derived from the award links provided in the text. Additionally, we retained the Sankey plots because these are a visual depiction of how awards transition from one mechanism to another, evolve in their funding sources, and advance in their research trajectories. The plot is an example of our continuity analysis which is only reported in the text and not visually shown for the remaining BCP programs.

    1. Author response:

      We thank the reviewers for their detailed and constructive comments on our manuscript entitled “Activity-Dependent Changes in Ion Channel Voltage-Dependence Influence the Activity Patterns Targeted by Neurons.” We appreciate the time and effort the reviewers invested in critiquing our work and are grateful for the opportunity to clarify and improve our manuscript.

      As noted by the reviewers, the main message of the manuscript is that the intrinsic properties and activity characteristics of targeted bursters depend on the timescale of half-(in)activation alterations in the homeostatic mechanism. However, the concerns of the reviewers reveal that the manuscript is organized in ways that detract from this message. Below we respond to the points the reviewers raise and close by outlining the changes that we will make to the manuscript as a result. Our goal will be to streamline the message of the paper while addressing the concerns of the reviewers.

      Response to Reviewer #1:

      Point 1: We interpret the reviewer’s question about “mechanism” to be: why do half-(in)activation alterations redirect degenerate bursters to different parameter regions? (A separate aspect of “mechanism,” namely how these alterations might be biologically implemented, is already addressed in the paper.)

      We speculate that Figure 3 illustrates this process. As conductance densities slowly evolve, rapid half-(in)activation changes cause the sensor variable (α) to jump abruptly as it searches for a voltage-dependence configuration that meets calcium targets (Figure 3A). The channel densities are slightly altered and this process continues again. Slowing the half-(in)activations alterations reduces these abrupt fluctuations (Figure 3B). Making the alterations infinitely slow effectively removes half-(in)activation changes altogether, leaving the system reliant solely on slower alterations in maximal conductances (Figure 3C). Because each timescale of half-(in)activation produces a different channel repertoire at each time step, the neuron follows distinct trajectories through the space of activity characteristics and intrinsic properties over the long term.

      Point 2: We appreciate the reviewer’s skepticism regarding our statistical approach with the “Group of 5” and “Group of 20.” These groups arose from historical aspects of our analysis and this analysis does not directly advance the main point—that changes in the timescale of channel voltage-dependence alterations impact the properties of bursters to which the homeostatic mechanism converges. Therefore, we plan to remove the references to the Group of 5 and focus on how the Group of 20 responds to variations in the timescale of voltage-dependent alterations.

      Point 3: Our paper claims that the half-(in)activation mechanism is subordinate to the maximal conductance mechanism. We agree with the reviewer that making this claim requires more care. The simulations we run are controls in the spirit described below.

      The reviewer notes that in our simulations, half-(in)activations are already near the range required for bursting, which forces maximal conductances to undergo larger changes and thus appear more critical. We however note that the opposite can also occur: if half-(in)activation values were already positioned in ranges required for bursting, an arrangement of small maximal conductances may potentially produce bursting. The latter might give the impression that maximal conductance alterations and half-(in)activation alterations are equally important. The simulations we ran are simply suggested this wasn’t true for these models.

      Points 4 - 6: In Point 4, the reviewer highlights model choices (e.g., constraints on maximal conductance and half-(in)activation, use of the L8 norm) are not clearly justified. In Point 5, the reviewer suggests that the paper provides excessive detail about other model choices. Point 6 appears to reiterate concerns about insufficient justification for some modeling decisions.

      Our intent was to acknowledge every caveat, which led us to include long section on Model Assumptions in the Discussion. However, as Point 5 notes, this makes the Discussion cumbersome. The Discussion should focus on remarks regarding the impact that timescale of half-(in)activation alterations have on the family of bursters targeted by the homeostatic mechanism. Consequently, we will relocate the extended discussion of model assumptions from the Discussion to the Methods section. This section already touches on how the constraints on half-(in)activation alterations compare to earlier versions of the model (noted in Point 6) and will be expanded to further explain our choice of the L8 norm (Point 4).

      Response to Reviewer #2:

      Weakness 1: The reviewer notes that the writing is “rather confusing.” This likely arises from the fact that we did not consistently emphasize the core message: the timescale of half-(in)activation alterations influences the intrinsic properties and activity characteristics of bursters targeted by the homeostatic mechanism. We will address this by reorganizing the manuscript to make that focus clearer, and we outline these planned revisions at the end of these responses.

      The reviewer specifically points out that the state-of-the-art is not clearly articulated. We will reorganize the Introduction to highlight this. Briefly, work on activity-dependent homeostasis has historically focused on changes in channel density. This is supported by experiment and has been modelled theoretically. In comparison, changes in channel voltage-dependence, while documented, are less explored due to the challenges of measuring them. In this work, we attempt to study the impact that alterations in channel voltage-dependence have on activity-dependent homeostasis. To do this, we extend existing computational models of activity-dependent homeostasis—models that have hitherto only altered channel density—by incorporating a mechanism that also adjusts channel voltage-dependence.

      Weakness 2: The Discussion highlights two potential implications of our findings—one for neuronal development and another for activity recovery following perturbations. However, they were outlined after the Model Assumptions section which, as Reviewer 1 points out, is quite detailed and cumbersome.

      Another aspect that may contribute to the challenge in interpreting our results may be our conceptual approach to neuronal excitability, which relies on a computational model of activity-dependent homeostasis that abstracts much of the underlying biochemistry. Our message is general: the timescale of half-(in)activation alterations influences the intrinsic properties and activity characteristics of bursters targeted by a homeostatic mechanism. As such, the implications are general. Their value lies in circumscribing a conceptual framework from which experimentalists may devise and test new hypotheses. We do not aim to predict or explain any specific phenomenon in this work. To address this concern however, we will expand our discussion of how these findings may guide experimental considerations, particularly regarding neuronal development and activity recovery during perturbations, to better illustrate the practical utility of our results.

      Response to Reviewer #3:

      Point 1: This reviewer suggests that our core message—namely, that the timescale of half-(in)activation alterations affects the intrinsic properties and activity patterns targeted by a homeostatic mechanism—should also apply during perturbations. We plan to address this by extending our analysis on the Group of 20 models. We will perturb activity by increasing extracellular potassium concentration and change the timescale of half-(in)activation alterations during the perturbation. This should underscore how the neuron’s stabilized activity pattern depends on this timescale, reinforcing our central message.

      Point 2: In this part of the Discussion, we noted that multiple half-activation shifts collectively shape the neuron’s global properties, and that averaging might obscure these effects. However, in light of the reviewers’ comments, we recognize that this observation alone does not directly advance the paper’s main message. To make it relevant, we would need to (1) identify correlations between intrinsic parameters (i.e., half-(in)activation and maximal conductance) and the resulting activity patterns, and (2) examine how these correlations shift under different timescales half-(in)activation alterations. Since we have not performed that analysis, we will revise this part of the Discussion to clarify its connection to the paper’s principal focus by noting that a deeper exploration of this notion using correlations will be the topic of future work.

      Conclusion: We outline updates we will make to the paper here.

      Introduction: In response to Reviewer 2, we will provide a clearer explanation of the state-of-the-art in activity-dependent homeostasis and highlight our specific contribution. We will emphasize that our conclusions, while generic, are relevant in experimental contexts.

      Results: We will reorganize this section to underscore the main point: the timescale of half-(in)activation alterations affects the intrinsic properties and activity characteristics of bursters in the homeostatic mechanism. Figures 1 will remain as it is. It shows assembly from random initial conditions and explain that for these simulations we must always consider the half-(in)activation mechanism with a mechanism that alters maximal conductances as the half-(in)activation alterations alone cannot form bursters. Figure 2 will remain as is, but we will remove any discussion of the “Group of 5,” addressing Reviewer 1’s feedback. What is presently Figure 4 will then follow, illustrating how timescale differences shape the properties of 20 degenerate solutions. We then present Figure 3 to address Reviewer 1’s critique on mechanism. Here we will explain how different timescales of half-(in)activation alteration cause the homeostatic mechanism to update channel properties differently, leading to distinct trajectories through the space of intrinsic properties and activity characteristics (as described in the response of Point 1 of Reviewer 1’s feedback). Finally, following Point 1 of Reviewer 3, we will add a new figure highlighting the role of half-(in)activation timescale during perturbation.

      Discussion: To streamline the Discussion, the “Model Assumptions” section will be moved to Methods. In line with Point 2 of Reviewer 3, we will clarify how the concept of "small half-(in)activation shifts lead to global changes in neuronal properties" aligns with our core message. Additionally, following Reviewer 2’s comments, we will expand our discussion of implications by including how experimentalists might use our findings to inform studies on perturbations and development.

      Methods: We will expand “Model Assumptions” to explain in more detail why we chose the L8 norm.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Overall, the conclusions of the paper are mostly supported by the data but may be overstated in some cases, and some details are also missing or not easily recognizable within the figures. The provision of additional information and analyses would be valuable to the reader and may even benefit the authors' interpretation of the data.

      We thank the reviewer for the thoughtful and constructive feedback. We are pleased that the reviewer found the overall conclusions of our paper to be well supported by the data, and we appreciate the suggestions for improving figure clarity and interpretive accuracy. Below we address each point raised:

      The conclusion that DREADD expression gradually decreases after 1.5-2 years is only based on a select few of the subjects assessed; in Figure 2, it appears that only 3 hM4Di cases and 2 hM3Dq cases are assessed after the 2-year timepoint. The observed decline appears consistent within the hM4Di cases, but not for the hM3Dq cases (see Figure 2C: the AAV2.1-hSyn-hM3Dq-IRES-AcGFP line is increasing after 2 years.)

      We agree that our interpretation should be stated more cautiously, given the limited number of cases assessed beyond the two-year timepoint. In the revised manuscript, we will clarify in both the Results and Discussion that the observed decline is based on a subset of animals. We will also state that while a consistent decline was observed in hM4Di-expressing monkeys, the trajectory for hM3Dq expression was more variable—with at least one case showing increased in signal beyond two years.

      Given that individual differences may affect expression levels, it would be helpful to see additional labels on the graphs (or in the legends) indicating which subject and which region are being represented for each line and/or data point in Figure 1C, 2B, 2C, 5A, and 5B. Alternatively, for Figures 5A and B, an accompanying table listing this information would be sufficient.

      We thank the reviewer for these helpful suggestions. In response, we will revise the relevant figures as noted in the “Recommendations for the authors”, including simplifying visual encodings and improving labeling. We will also provide a supplementary table listing the animal ID and brain regions for each data point shown in the graphs.

      While the authors comment on several factors that may influence peak expression levels, including serotype, promoter, titer, tag, and DREADD type, they do not comment on the volume of injection. The range in volume used per region in this study is between 2 and 54 microliters, with larger volumes typically (but not always) being used for cortical regions like the OFC and dlPFC, and smaller volumes for subcortical regions like the amygdala and putamen. This may weaken the claim that there is no significant relationship between peak expression level and brain region, as volume may be considered a confounding variable. Additionally, because of the possibility that larger volumes of viral vectors may be more likely to induce an immune response, which the authors suggest as a potential influence on transgene expression, not including volume as a factor of interest seems to be an oversight.

      We thank the reviewer for raising this important issue. We agree that injection volume is a potentially confounding variable. In response, we will conduct an exploratory analysis including volume as an additional factor. We will also expand the Discussion to highlight the need for future systematic evaluation of injection volume, especially in relation to immune responses or transduction efficiency in different brain regions.

      The authors conclude that vectors encoding co-expressed protein tags (such as HA) led to reduced peak expression levels, relative to vectors with an IRES-GFP sequence or with no such element at all. While interesting, this finding does not necessarily seem relevant for the efficacy of long-term expression and function, given that the authors show in Figures 1 and 2 that peak expression (as indicated by a change in binding potential relative to non-displaced radioligand, or ΔBPND) appears to taper off in all or most of the constructs assessed. The authors should take care to point out that the decline in peak expression should not be confused with the decline in longitudinal expression, as this is not clear in the discussion; i.e. the subheading, "Factors influencing DREADD expression," might be better written as, "Factors influencing peak DREADD expression," and subsequent wording in this section should specify that these particular data concern peak expression only.

      We appreciate this important clarification. In response, we will revise the title to “Factors influencing peak DREADD expression levels”, and we will specify that our analysis focused on peak ΔBP<sub>ND</sub> values around 60 days post-injection. We will also explicitly distinguish these findings from the later-stage changes in expression seen in the longitudinal PET data in both the Results and Discussion sections.

      Reviewer #2 (Public review):

      Weaknesses

      This study is a meta-analysis of several experiments performed in one lab. The good side is that it combined a large amount of data that might not have been published individually; the downside is that all things were not planned and equated, creating a lot of unexplained variances in the data. This was yet judiciously used by the authors, but one might think that planned and organized multicentric experiments would provide more information and help test more parameters, including some related to inter-individual variability, and particular genetic constructs.

      We thank the reviewer for bringing this important point to our attention. We fully agree that the retrospective nature of our dataset, compiled from multiple studies conducted within a single laboratory, introduces variability due to differences in constructs, injection sites, and timelines. While this reflects the real-world constraints of long-term NHP research, we acknowledge the need for more standardized approaches. We will add a statement in the revised Discussion emphasizing that future multicenter and harmonized studies would be valuable for systematically examining specific parameters and inter-individual variability.

      Reviewer #3 (Public review):

      Minor weaknesses are related to a few instances of suboptimal phrasing, and some room for improvement in time course visualization and quantification. These would be easily addressed in a revision.

      These findings will undoubtedly have a very significant impact on the rapidly growing but still highly challenging field of primate chemogenetic manipulations. As such, the work represents an invaluable resource for the community.

      We thank the reviewer for the positive assessment of our manuscript and for the constructive suggestions noted in the “Recommendations for the authors”. In response, we will carefully review and revise the manuscript to improve visualization and quantification.

  2. Mar 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Evading predation is of utmost importance for most animals and camouflage is one of the predominant mechanisms. Wu et al. set out to test the hypothesis of a unique camouflage system in leafhoppers. These animals coat themselves with brochosomes, which are spherical nanostructures that are produced in the Malpighian tubules and are distributed on the cuticle after eclosion. Based on previous findings on the reflectivity properties of brochosomes, the authors provide very good evidence that these nanostructures indeed reduce the reflectivity of the animals thereby reducing predation by jumping spiders. Further, they identify four proteins, which are essential for the proper development and function of brochosomes. In RNAi experiments, the regular brochosome structure is lost, the reflectivity reduced and the respective animals are prone to increased predation. Finally, the authors provide some phylogenetic sequence analyses and speculate about the evolution of these essential genes.

      Strengths:

      The study is very comprehensive including careful optical measurements, EM and TM analysis of the nanoparticles and their production line in the malphigian tubules, in vivo predation tests, and knock-down experiments to identify essential proteins. Indeed, the results are very convincingly in line with the starting hypothesis such that the study robustly assigns a new biological function to the brochosome coating system.

      A key strength of the study is that the biological relevance of the brochosome coating is convincingly shown by an in vivo predation test using a known predator from the same habitat.

      Another major step forward is an RNAi screen, which identified four proteins, which are essential for the brochosome structure (BSMs). After respective RNAi knock-downs, the brochosomes show curious malformations that are interesting in terms of the self-assembly of these nanostructures. The optical and in vivo predation tests provide excellent support for the model that the RNAi knock-down leads to a change of brochosomes structure, which reduces reflectivity, which in turn leads to a decrease of the antipredatory effect.

      Thank you very much for your positive feedback and insightful comments on our manuscript. We are delighted that you acknowledge the efforts we have made in studying the components and functions of Brochosomal proteins. We have carefully considered your suggestions and have thoroughly revised the manuscript to address the shortcomings identified in our original submission. We hope that the revised version meets with your approval. Below, please find our detailed point-by-point responses.

      Weaknesses:

      The reduction of reflectivity by aberrant brochosomes or after ageing is only around 10%. This may seem little to have an effect in real life. On the other hand, the in vivo predation tests confirm an influence. Hence, this is not a real weakness of the study - just a note to reconsider the wording for describing the degree of reflectivity.

      Thank you for your valuable suggestions. Based on your recommendations, we have revised the manuscript accordingly. Although the absolute reduction in light reflection due to Brochosomal coverage is approximately 10%, the relative decrease in light reflection on the leafhopper's surface is nearly 30%. Specifically, in the ultraviolet region, the reflection is reduced from about 30% to 20%, and in the visible light region, it is reduced from 20% to 10%. For detailed revisions, please refer to lines 151-156 of the revised manuscript.

      The single gene knockdowns seemed to lead to a very low penetrance of malformed brochosomes (Figure Supplement 3). Judging from the overview slides, less than 1% of brochosomes may have been affected. A quantification of regular versus abnormal particles in both, wildtype and RNAi treatments would have helped to exclude that the shown aberrant brochosomes did not just reflect a putative level of "normal" background defects. Of note, the quadruple knock-down of all BSMs seemed to lead to a high penetrance (Figure 4), which was already reflected in the microtubule production line. While the data shown are convincing, a quantification might strengthen the argument.

      While the RNAi effects seemed to be very specific to brochosomes and therefore very likely specific, an off-target control for RNAi was still missing. Finding the same/similar phenotype with a non-overlapping dsRNA fragment in one off-target experiment is usually considered required and sufficient. Further, the details of the targeted sequence will help future workers on the topic.

      Thank you for your valuable suggestions. Based on your recommendations, we have synthesized dsRNA targeting two non-overlapping regions of the coding sequences for four Brochosomal structural protein genes. These dsRNAs were injected individually and in combination for each gene. Our RNAi experiments for each BSM gene demonstrated that both individual and combined injections significantly suppressed the expression of the target genes, with the combined injection yielding slightly better silencing efficiency. Statistical analysis of the SEM observations revealed that the combined injection of dsRNAs targeting two non-overlapping regions led to a 60-70% reduction in the surface area coverage of Brochosomes. Additionally, approximately 20% of the remaining Brochosomes exhibited significant morphological changes. For detailed revisions, please refer to lines 199-211 of the revised manuscript, as well as Figures 3A and 3C, and Supplementary Figures 4 and 5.

      The main weakness in the current manuscript may be the phylogenetic analysis and the model of how the genes evolved. Several aspects were not clearly or consistently stated such that I felt unsure about what the authors actually think. For instance: Are all the 4 BSMs related to each other or only BSM2 and 3? If so, not only BSM2 and 3 would be called "paralogs" but also the other BSMs. If they were all related, then a phylogenetic tree including all BSMs should be shown to visualize the relatedness (including the putative ancestral gene if that is the model of the authors). Actually, I was not sure about how the authors think about the emergence of the BSMs. Are they real orphan genes (i.e. not present outside the respective clade) or was there an ancestral gene that was duplicated and diverged to form the BSMs? Where in the phylogeny does the first of the BSMs or ancestral proteins emerge (is the gene found in Clastoptera arizonana the most ancestral one?)? Maybe, the evolution of the BSMs would have to be discussed individually for each gene as they show somewhat different patterns of emergence and loss (BSM4 present in all species, the others with different degrees of phylogenetic restriction).

      Thank you very much for your constructive feedback on our phylogenetic analysis and the modeling of gene evolution. We fully agree with your insights and acknowledge that the evolutionary analysis of BSM genes remains somewhat ambiguous. This ambiguity is primarily due to the limited research on the precise structural protein composition of Brochosomes. While proteomics studies have analyzed and discussed the structural proteins of Brochosomes, the accurate composition of these proteins is still poorly understood. In this study, we identified four BSM proteins, but given the intricate structure of Brochosomes as proteinaceous spheres, we believe there may be additional BSM genes that have not yet been identified. Moreover, despite the presence of over ten thousand species within the Cicadomorpha, only three species have genome sequences available, and fewer than a hundred species have transcriptome sequencing data. The scarcity of research on Brochosomes, as well as the limited availability of genomic and transcriptomic data, poses significant challenges for our phylogenetic analysis and understanding of BSM gene evolution.

      Based on your suggestions, we have revised the manuscript accordingly. Specifically, we have updated Figure 5C by including ten additional species from Cereopoidea, Cicadoidea, and Fulgoroidea to better illustrate that BSM genes are true orphan genes. We have also added a phylogenetic tree of BSM genes within Cicadidae in Supplementary Figure 3. Additionally, we have expanded the discussion of BSM gene evolution in the manuscript (lines 503-556). For detailed revisions, please refer to Figure 5C, Supplementary Figure 3, and lines 507-585 of the revised manuscript.

      Related to these questions I remained unsure about some details in Figure 5. On what kind of analysis is the phylogeny based? Why are some species not colored, although they are located on the same branch as colored ones? What is the measure for homology values - % identity/similarity? The homology labels for Nephotetix cincticeps and N. virescens seem to be flipped: the latter is displayed with 100% identity for all genes with all proteins while the former should actually show this. As a consequence of these uncertainties, I could not fully follow the respective discussion and model for gene evolution.

      Thank you very much for your insightful comments and suggestions. We have carefully considered your feedback and have thoroughly revised our manuscript accordingly. Specifically, we have enhanced the description of the phylogenetic analysis process to provide greater clarity and transparency, with the detailed methods now included in lines 789-798. Regarding Figure 5C, we appreciate your attention to the coloring scheme. We would like to clarify that the family Cicadellidae comprises 25 subfamilies, many of which are represented by only one species in our figure. To ensure clarity and meaningful representation, we have chosen to color only those subfamilies with more than three species, thereby avoiding visual clutter and emphasizing the most relevant taxonomic groups. Additionally, we have corrected the inverted homology labels for Nephotetix cincticeps and Nephotetix virescens to ensure the accuracy and consistency of our data presentation.

      Conclusion:

      The authors successfully tested their hypothesis in a multidisciplinary approach and convincingly assigned a new biological function to the brochosomes system. The results fully support their claims - only the quantification of the penetrance in the RNAi experiments would be helpful to strengthen the point. The author's analysis of the evolution of BSM genes remained a bit vague and I remained unsure about their respective conclusions.

      The work is a very interesting study case of the evolutionary emergence of a new system to evade predators. Based on this study, the function of the BSM genes could now be studied in other species to provide insights into putative ancestral functions. Further, studying the self-assembly of such highly regular complex nano-structures will be strongly fostered by the identification of the four key structural genes.

      Reviewer #1 (Recommendations for the authors):

      Main manuscript:

      Please consider the annotated pdf with suggestions for wording and comments at the authors' discretion:

      Thank you very much for your detailed suggestions and comments provided in the annotated PDF. We have carefully reviewed each of your points and have revised the manuscript accordingly. All changes have been highlighted in red text for your convenience. The revised manuscript with tracked changes is available for your review. We believe these revisions have improved the clarity and quality of our manuscript. Thank you again for your valuable feedback.

      Supplementary Figure 2 C:

      Y-axes:

      - label: "surface coverage in %"

      - there are different scale values for the different days (e.g. 80-105 for day 5 and 0-80 at day 25). As a comparison between days is interesting, it would help to have the same scale values for all. That would show the decrease more intuitively.

      Thank you very much for your suggestion regarding the Y-axis in Supplementary Figure 2C. We agree that using a consistent scale across all time points is essential for clear and intuitive comparison. In the revised manuscript, we have standardized the Y-axis scale for Supplementary Figure 2C to a uniform range of 0-100% for all days. This change allows for a more straightforward visualization of the decreasing trend in surface coverage over time.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors investigate the optical properties of brochosomes produced by leafhoppers. They hypothesize that brochosomes reduce light reflection on the leafhopper's body surface, aiding in predator avoidance. Their hypothesis is supported by experiments involving jumping spiders. Additionally, the authors employ a variety of techniques including micro-UV-Vis spectroscopy, electron microscopy, transcriptome and proteome analysis, and bioassays. This study is highly interesting, and the experimental data is well-organized and logically presented.

      Strengths:

      The use of brochosomes as a camouflage coating has been hypothesized since 1936 (R.B. Swain, Entomol. News 47, 264-266, 1936) with evidence demonstrated by similar synthetic brochosome systems in a number of recent studies (S. Yang, et al. Nat. Commun. 8:1285, 2017; L. Wang, et al., PNAS. 121: e2312700121, 2024). However, direct biological evidence or relevant field studies have been lacking to directly support the hypothesis that brochosomes are used for camouflage. This work provides the first biological evidence demonstrating that natural brochosomes can be used as a camouflage coating to reduce the leafhoppers' observability of their predators. The design of the experiments is novel.

      We are extremely grateful for your positive feedback and insightful comments on our manuscript. We are delighted that you have recognized the efforts we have put into our research on how brochosomes serve as a camouflage coating to reduce the detectability of leafhoppers to their predators. We have carefully considered your suggestions and have thoroughly revised the manuscript to address the shortcomings of the original version. We hope that the revised version meets with your approval. Below, please find our detailed point-by-point responses.

      Weaknesses:

      (1) The observation that brochosome coatings become sparse after 25 days in both male and female leafhoppers, resulting in increased predation by jumping spiders, is intriguing. However, since leafhoppers consistently secrete and groom brochosomes, it would be beneficial to explore why brochosomes become significantly less dense after 25 days.

      Thank you very much for your valuable suggestions. We appreciate your interest in the reduction of brochosomal density on the surface of leafhoppers after 25 days.We believe that the primary reason for the decreased density of brochosomes on the leafhopper surface after 25 days is the reduced synthesis and secretion of brochosomes. The Malpighian tubules are the main sites for brochosome synthesis. As shown in Figure 2D and Supplementary Figure 1, the thick glandular segments of the Malpighian tubules in both male and female leafhoppers begin to atrophy 15 days after reaching adulthood. This indicates a gradual decline in brochosome synthesis and secretion after day 15 of adulthood. Following your suggestion, we have revised the discussion section of the manuscript to elaborate on this observation. The detailed changes can be found in lines 474-491 of the revised manuscript.

      (2) The authors demonstrate that brochosome coatings reduce UV (specular) reflection compared to surfaces without brochosomes, which can be attributed to the rough geometry of brochosomes as discussed in the literature. However, it would be valuable to investigate whether the proteins forming the brochosomes are also UV absorbing.

      Thank you very much for your valuable suggestions. Following your advice, we have successfully expressed four BSM genes in a prokaryotic system, purified the corresponding proteins, and applied them to quartz glass surfaces. We then measured the light reflectance of the quartz glass surfaces coated with these purified proteins. The results showed that the purified BSM proteins did not exhibit better antireflective properties compared to the control GST protein. For more details, please refer to Supplementary Figure 8 in the revised manuscript.  We believe that the excellent antireflective properties of brochosomes are fundamentally due to their unique geometric shapes. The hollow pores within the brochosomes, with diameters of approximately 100 nm, are significantly smaller than most wavelengths in the visible spectrum. When light passes through these tiny pores, diffraction occurs, while light passing through the ridges of the brochosomes causes scattering. The interference between the diffracted and scattered light from these pores and ridges results in the observed extinction characteristics of brochosomes. We have incorporated these insights into the discussion section of the revised manuscript (lines 416-425 and lines 432-442 of the revised manuscript).

      (3) The experiments with jumping spiders show that brochosomes help leafhoppers avoid predators to some extent. It would be beneficial for the authors to elaborate on the exact mechanism behind this camouflage effect. Specifically, why does reduced UV reflection aid in predator avoidance? If predators are sensitive to UV light, how does the reduced UV reflectance specifically contribute to evasion?

      Thank you very much for your valuable suggestions. Based on your advice, we have included a detailed discussion on how reducing ultraviolet (UV) reflection can help insects avoid predation. The revised content can be found in lines 445-460 of the revised manuscript.

      “UV light serves as a crucial visual cue for various insect predators, enhancing foraging, navigation, mating behavior, and prey identification (Cronin & Bok, 2016; Morehouse et al., 2017; Silberglied, 1979). Predators such as birds, reptiles, and predatory arthropods often rely on UV vision to detect prey (Church et al., 1998; Li & Lim, 2005; Zou et al., 2011). However, UV reflectance from insect cuticles can disrupt camouflage, increasing the risk of detection and predation, as natural backgrounds like leaves, bark, and soil typically reflect minimal UV light (Endler, 1997; Li & Lim, 2005; Tovee, 1995). To mitigate this risk, insects often possess anti-reflective cuticular structures that reduce UV and broad-spectrum light reflectance. This strategy is widespread among insects, including cicadas, dragonflies, and butterflies, and has been shown to decrease predator detection rates (Hooper et al., 2006; Siddique et al., 2015; Zhang et al., 2006). For example, the compound eyes of moths feature hexagonal protuberances that reduce UV reflectance, aiding nocturnal concealment (Blagodatski et al., 2015; Stavenga et al., 2005). In butterflies, UV reflectance from eyespots on wings can attract predators, but reducing UV reflectance or eyespot size can lower predation risk and enhance camouflage (Chan et al., 2019; Lyytinen et al., 2004). Hence, the reflection of ultraviolet light from the insect cuticle surface increases the risk of predation by disrupting camouflage (Tovee, 1995)”

      (4) An important reference regarding the moth-eye effect is missing. Please consider including the following paper: Clapham, P. B., and M. C. Hutley. "Reduction of lens reflection by the 'Moth Eye' principle." Nature 244: 281-282 (1973).

      Thank you very much for pointing out the omission of the important reference on the “moth eye” effect. We sincerely apologize for the oversight. Based on your suggestion, we have now included the seminal paper by Clapham and Hutley (1973) in the revised manuscript. The reference has been added to both the Introduction and Discussion sections to provide a more comprehensive context for our discussion on anti-reflective structures in insects.

      (5) The introduction should be revised to accurately reflect the related contributions in literature. Specifically, the novelty of this work lies in the demonstration of the camouflage effect of brochosomes using jumping spiders, which is verified for the first time in leafhoppers. However, the proposed use of brochosome powder for camouflage was first described by R.B. Swain (R.B. Swain, Notes on the oviposition and life history of the leafhopper Oncometopta undata Fabr. (Homoptera: Cicadellidae), Entomol. News. 47: 264-266 (1936)). Recently, the antireflective and potential camouflage functions of brochosomes were further studied by Yang et al. based on synthetic brochosomes and simulated vision techniques (S. Yang, et al. "Ultra-antireflective synthetic brochosomes." Nature Communications 8: 1285 (2017)). Later, Lei et al. demonstrated the antireflective properties of natural brochosomes in 2020 (C.-W. Lei, et al., "Leafhopper wing-inspired broadband omnidirectional antireflective embroidered ball-like structure arrays using a nonlithography-based methodology." Langmuir 36: 5296-5302 (2020)). Very recently, Wang et al. successfully fabricated synthetic brochosomes with precise geometry akin to those natural ones, and further elucidated the antireflective mechanisms based on the brochosome geometry and their role in reducing the observability of leafhoppers to their predators (L. Wang et al. "Geometric design of antireflective leafhopper brochosomes." Proceedings of the National Academy of Sciences 121: e2312700121 (2024)).

      Thank you very much for your valuable suggestions regarding the revision of the introduction to accurately reflect the relevant contributions in the literature. Based on your feedback, we have thoroughly revised the introduction and added the suggested references to provide a comprehensive context for our study. The details of these revisions can be found in lines 84-94 of the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) In Figure 2E, the data for Male-5d appears to be missing. Please verify and ensure all relevant data is included.

      Thank you for pointing out the issue regarding the data presentation in Figure 2E.We apologize for any confusion caused by the overlapping data points and the less conspicuous color choice for Male-5d. We have carefully reviewed the data and confirmed that all relevant data points, including Male-5d, are indeed present in the dataset. In the revised manuscript, we have adjusted the color scheme for Male-5d and Female-5d in Figure 2E to ensure that both curves are clearly distinguishable, even in areas where they overlap. This adjustment should facilitate a more accurate and convenient observation of the data trends. We appreciate your attention to detail, and we believe these revisions have improved the clarity and readability of the figure.

      (2) In Figure 6, please clarify the reflectance data in the inset. Clearly explain what the blue and light blue curves represent.

      Thank you for your suggestion regarding Figure 6.We have revised the figure to improve clarity. The light blue curve now represents the reflectance measurements of leafhoppers with higher brochosome coverage, while the dark blue curve corresponds to those with lower coverage. These changes, along with updated labels in the figure legend, ensure that the data are clearly distinguishable and easy to interpret. We appreciate your feedback and believe these revisions have enhanced the overall clarity of the figure.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses (clarifications needed):

      (1) Experimental Design:

      The study does not mention whether the authors examined sex differences or any measures of attractiveness or hierarchy among participants (e.g., students vs. teachers). Including these variables could provide a more nuanced understanding of group dynamics.

      We are grateful to the reviewer for pointing out this valuable question. We have clarified that future studies should include sex differences or any measures of attractiveness or hierarchy among participants (e.g., students vs. teachers) (p. 27).

      “Finally, future research should investigate additional variables, including sex differences and measures of attractiveness or hierarchy among participants, such as students versus teachers.”  p. 27

      (2) fNIRS Data Acquisition:

      The authors' approach to addressing individual differences in anatomy is lacking in detail. Understanding how they identified the optimal channels for synchrony between participants would be beneficial. Was this done by averaging to find the location with the highest coherence?

      We apologize for missing some details here. We have included the following information in the fNIRS data acquisition and fNIRS data analyses to clarify the details (pp. 8 and 12).

      We employed the one-sample t-test method to assess the GNS disparity between the baseline and task sessions, identifying particular channels of interest. This analysis did not ascertain the maximum coherence level, but rather pinpointed the channel exhibiting significant divergence between the two sessions, which we designated as pertinent to the group decision-making task. Furthermore, we selected the PFC and left TPJ as our reference brain regions, guided by existing literature.

      “Two optode probe sets were used to cover each participant's prefrontal and left TPJ regions (Figure S1). The DLPFC plays a crucial role in group decision-making processes, with findings suggesting that individuals exhibiting reduced prefrontal activity were more prone to out-group exclusion and demonstrated stronger in-group preferences (Goupil et al., 2021; Jankovic, 2014; Yang et al., 2020). Similarly, the left TPJ has been previously reported to be associated with decision-making and information exchange (Freitas et al., 2019; Tindale et al., 2019).”  p. 8

      “Time-averaged GNS (also averaged across channels in each group) was compared between the baseline session (i.e., the resting phase) and the task session (from reading information to making decisions) using a series of one-sample t-tests. Here, p-values were thresholded by controlling for FDR (p < 0.05; Benjamini & Hochberg, 1995). When determining the frequency band of interest, the time-averaged GNS was also averaged across channels. After that, we analyzed the time-averaged GNS of each channel. Then, channels showing significant GNS were regarded as regions of interest and included in subsequent analyses.” p. 12

      (3) Behavioral Analysis:

      For group identification, the analysis currently uses a dichotomous approach. Introducing a regression model to capture the degree of identification could offer more granular insights into how varying levels of group identification affect collective behavior and performance.

      Thank you for your suggestion. As suggested, we have conducted the regression model to examine how varying levels of group identification affect collective performance, with the score of group identification being the independent variable and collective performance as the dependent variable (pp.9 and 15).

      “Moreover, we employed a regression model to examine how varying levels of group identification affect collective performance, using group identification scores as the independent variable and collective performance as the dependent variable.”  p.9

      “The results from the regression model highlighted a significant association between the degree of group identification and collective performance (β \= 0.45, t = 4.56, p \= 0.019).”  p.15

      (4) Single Brain Activation Analysis:

      The application of the General Linear Model (GLM) is unclear, particularly given the long block durations and absence of multiple trials. Further explanation is needed on how the GLM was implemented under these conditions.

      Thank you for your suggestion, we have added more details in this section (p.11).

      “In the GLM model analysis, HbO was the dependent variable, and the regression amount was set for different task stages (a. Reading information, b. Sharing private information, c. Discussing information, d. Decision). After that, we convolved the regression factor with the Hemodynamic Response Function (HRF) and obtained the brain activation β value of each participant in each channel at different task stages through regression analysis.’  p.11

      (5) Within-group neural Synchrony (GNS) Calculation:

      The method for calculating GNS could be improved by using mutual information instead of pairwise summation, as suggested by Xie et al. (2020) in their study on fMRI triadic hyperscanning. Additionally, the explanation of GNS calculation is inconsistent. At one point, it is mentioned that GNS was averaged across time and channels, while elsewhere, it is stated that channels with the highest GNS were selected. Clarification on this point is essential.

      We appreciate the reviewer for highlighting this inquiry. We utilized a conventional GNS calculation approach, as detailed in Line 296 of the manuscript, where the GNS was determined in pairs after the WTC computation, and then averaged. Further details regarding the second question have been provided in the article (p.12).

      (6) Placement of fNIRS Probes:

      The probes were only placed in the frontal regions, despite literature suggesting that the superior temporal sulcus (STS) and temporoparietal junction (TPJ) regions are crucial for triadic team performance. A justification for this choice or inclusion of these regions in future studies would be beneficial.

      The original manuscript clearly stated the use of two optode probe sets to encompass the prefrontal and left TPJ regions of each participant (see Figure S1, p. 8).

      (7) Interpretation of fNIRS Data:

      Given that fNIRS signals are slow, similar to BOLD signals in fMRI, the interpretation of Figure 6 raises concerns. It suggests that it takes several minutes (on the order of 4-5 minutes) for people to collaborate, which seems implausible. More context or re-evaluation of this interpretation is needed.

      The question you have pointed out is very pertinent, and we have added more explanation for this result (pp. 25-26).

      As previous studies have shown, the BOLD signal collected by fNIRS is slowly increasing compared to neuronal activity, which means that it has hysteresis (Turner et al., 1998). In social interactions such as group decision-making, the time of neural synchronization is delayed because people need to spend time increasing the number of dialogues to improve collaboration efficiency and form the same preference (Zhang et al., 2019). For example, the study of group consensus found that participants would show significant neural alignment after completing a period of dialogue (Sievers et al., 2024). In the task of cooperation, with the improvement of tacit understanding between two participants, the higher degree of neural synchronization (Cui et al., 2012). Therefore, the generation of neural synchronization depends on the interaction over a period of time. Therefore, we believe that the 4-5 minutes of collaboration time shown in Figure 6 may be related to establishing consensus and the same preference of team members, which is reflected in the dynamic time change of neural synchronization.

      Moreover, previous studies on neural synchronization during social interaction and group decision-making revealed that substantial neural synchronization occurred around 50-55 seconds into a teaching task involving prior knowledge (Liu et al., 2019) and persisted approximately 6 minutes into the discussion period (Xie et al., 2023). These results collectively validate the suitability of utilizing fNIRS signal response time in our study (pp. 25-26).

      “Our study also has demonstrated significant increases in single-brain activation, DLPFC-OFC functional connectivity, and GNS at 7, 12, and 17 minutes, respectively, following task initiation. The significant increase in these neural activities together constructs the two-in-one neural model that explains how group identification influences the collective performance we proposed. As previous studies have shown, the BOLD signal collected by fNIRS is slowly increasing compared to neuronal activity, which means that it has hysteresis (Turner et al., 1998). In social interactions such as group decision-making, the time of neural synchronization is delayed because people need to spend time increasing the number of dialogues to improve collaboration efficiency and form the same preference (Zhang et al., 2019). For example, participants would exhibit significant neural alignment, but only after they had completed a period of dialogue (Sievers et al., 2024). In the task of cooperation, with the improvement of cooperation efficiency between two participants, the higher degree of neural synchronization (Cui et al., 2012). Therefore, the generation of neural synchronization depends on the interaction over a period of time, which can affect the estimation of collaboration time. Prior research has shown that when the teaching task with prior knowledge began 50-55 seconds, significant neural synchronization could be generated between teacher and students, which meant that students and teacher achieved the same goal of learning knowledge (Liu et al., 2019). Moreover, a noteworthy increase in GNS was observed approximately 6 minutes into the group discussion period for better discussing and solving the problem (Xie et al., 2023). These findings are similar to ours. Therefore, the time points we found could reflect the dynamic time change of the neural process of team collaboration.’ pp.25-26

      Reviewer #2 (Public review):

      Weaknesses:

      The authors need to clearly articulate their hypothesis regarding why neural synchronization occurs during social interaction. For example, in line 284, it is stated that "It is plausible that neural synchronization is closely associated with group identification and collective performance...", but this is far from self-evident. Neural synchronization can occur even when people are merely watching a movie (Hasson et al., 2004), and movie-watchers are not engaged in collective behavior. There is no direct link between the IBS and collective behavior. The authors should explain why they believe inter-brain synchronization occurs in interactive settings and why they think it is related to collective behavior/performance.

      Thank you for bringing these points to our attention, we have clarified the relationship between neural synchronization and collective behavior in the Introduction section. (p.4). Moreover, in order to investigate whether neural synchronization stems from a common task or environment, we pseudo-randomized all pairs of subjects and created a null distribution consisting of 1,000 pseudo-groups, as described in Lines 311-315. This approach enabled us to eliminate neural synchronization resulting from factors other than social interaction, allowing us to identify neural patterns associated with collective performance (p.12).

      “Moreover, Ni et al. (2024) indicated that neural synchronization was linked to the strength of social-emotional communication and connections between individuals. An increase in neural synchronization has also been shown to predict the coordination and cooperation abilities of group members (Lu et al., 2023). Therefore, we hypothesize that neural synchronization may be related to group performance.” p.4

      “After that, the nonparametric permutation test was conducted on the observed interaction effects on GNS of the real group against the 1,000 permutation samples. By pseudo-randomizing the data of all participants, a null distribution of 1000 pseudo-groups was generated (e.g., time series from member 1 in group 1 were grouped with member 2 in group 2 & member 3 in group 3). The GNS of 1,000 reshuffled pseudo-groups was computed, and the GNS of the real groups was assessed by comparing it with the values generated by 1000 reshuffled pseudo-groups.” p.12

      The authors state that "GNS in the OFC was a reliable neuromarker, indicating the influence of group identification on collective performance," but this claim is too strong. Please refer to Figure 4B. Do the authors really believe that collective performance can be predicted given the correlation with the large variance shown? There is a significant discrepancy between observing a correlation between two variables and asserting that one variable is a predictive biomarker for the other.

      Thank you for your suggestion, we have revised the relevant statement (p.18).

      “Through correlation and regression model analysis, we found that in group decision-making, the increase in group identity would affect group performance by improving GNS in the OFC brain region.”  p.18

      Why are the individual answers being analyzed as collective performance (See, L-184)? Although these are performances that emerge after the group discussion, they seem to be individual performances rather than collective ones. Typically, wouldn't the result of a consensus be considered a collective performance? The authors should clarify why the individual's answer is being treated as the measure of collective performance.

      We appreciate the insightful comment provided by the reviewer. The decision to utilize individual responses as a metric of overall performance is based on several key considerations. Previous studies on various hidden profile tasks have utilized averaged individual scores to represent collective performance (e.g., Stasser et al., 1995; Wittenbaum et al., 1996; Brockner et al., 2022). Secondly, while consensus outcomes are typically regarded as collective expressions, we argue that in the context of this study, individual responses are not independent entities but rather extensions of the group decision-making process. The collective deliberation process significantly influenced individual thinking and decision-making in this study. Through group discussions, members shared perspectives, adjusted their stances, and formulated their responses based on collective insights. The responses provided by participants in this study were molded by the dynamics of group conversations, serving as an indirect measure of group performance and potentially indicating the efficacy of collective deliberations.

      Performing SPM-based mapping followed by conducting a t-test on the channels within statistically significant regions constitutes double dipping, which is not an acceptable method (Kriegeskorte et al., 2011). This issue is evident in, for example, Figures 3A and 4A.

      Please refer to the following source: https://www.nature.com/articles/nn.2303

      We have carefully reviewed the articles provided by the reviewer, and we acknowledge the concerns regarding selective analysis and double dipping in our statistical approach. To address this, we believe it is important to clarify this issue further in the Discussion section (pp.26-27).

      Our study introduces a novel perspective while utilizing conventional fNIRS-based hyperscanning analyses (Liu et al., 2019; Pärnamets et al., 2020; Reinero et al., 2021; Számadó et al., 2021; Solansky, 2011), methods that are widely endorsed within the field. In our analysis, significant channels were first identified using a one-sample t-test, followed by additional analyses including ANOVA, independent samples t-tests, and other procedures. We would like to emphasize that the statistical assumptions underlying the one-sample t-test and paired-sample t-test in our study maintain a level of independence. Moreover, to further mitigate concerns about the potential for double dipping, we employed permutation testing to validate the robustness of our results and ensure that our findings are not influenced by biases inherent in the selection of significant regions.

      We recognize the importance of rigorous statistical practices and are committed to upholding the highest standards of analysis. As such, we have revisited our methodology and included a more detailed explanation of the steps taken to avoid double dipping and ensure the integrity of our analyses in the revised manuscript.

      “Although our study has found a new perspective, the analysis method still refers to and uses the traditional fNIR-based hyperscanning analyses (Liu et al., 2019; P¨arnamets et al., 2020; Reinero et al., 2021; Számadó et al., 2021; Solansky, 2011), which is generally accepted by the majority of fNIR-based hyperscanning researchers. For example, we would first identify significant channels through a one-sample t-test and then conduct further analyses, such as ANOVA or independent samples t-tests. Selective analysis is a powerful tool and is perfectly justified whenever the results are statistically independent of the selection criterion under the null hypothesis (Kriegeskorte et al., 2019). However, it may lead to double dipping and missing information. In this study, the absence of statistically significant TPJ activation in the analyzed data led to the TPJ being ignored. In the future, it should be made explicit in the analysis, and the reliability of the results should be ensured by appropriate statistical methods (e.g., cross-validation, independent data sets, or techniques to control for selective bias).” p.26-27

      In several key analyses within this study (e.g., single-brain activation in the paragraph starting from L398, neural synchronization in the paragraph starting from L393), the TPJ is mentioned alongside the DLPFC. However, in subsequent detailed analyses, the TPJ is entirely ignored.

      We thank the reviewer for your careful review and valuable comment. TPJ is referenced in certain analyses within this paper (as detailed in paragraphs L414 and L440); however, its role remains inadequately investigated and expounded upon in subsequent more intricate analyses. This is due to the absence of statistically significant TPJ activation in the analyzed data. As pointed out by the reviewer, limitations may exist in pursuing further analyses through ROIs, a point we also have addressed in the Discussion section (p.27).

      The method for analyzing single-brain activation is unclear. Although it is mentioned that GLM (generalized linear model) was used, it is not specified what regressors were prepared, nor which regressor's β-values are reported as brain activity. Without this information, it is difficult to assess the validity of the reported results.

      We have revised the relevant description to clarify the analyses of single-brain activation (p. 11)

      While the model illustrated in Figure 7 seems to be interesting, for me, it seems not to be based on the results of this study. This is because the study did not investigate the causal relationships among the three metrics. I guess, Figure 5D might be intended to explain this, but the details of the analysis are not provided, making it unclear what is being presented.

      We regret the confusion that has arisen. Firstly, as highlighted by the reviewer, the model depicted in Figure 7 is not directly derived from the causal analysis conducted in this study. Our investigation did not directly explore the causal relationships among the three indicators; instead, we constructed a model based on correlations and potential mechanisms. In the revised manuscript, we have explicitly stated that Figure 7 represents a descriptive model (p.22).

      Regarding Figure 5D, the reviewer noted that while it may offer some explanatory value, it lacks the necessary analytical detail to elucidate the chart's significance clearly. We have clarified the details of the analysis in Figure 5 (pp.13-14). The model in Figure 5D suggested that the connection between the similarity in individual-collective performance and the correlation of brain activation, as well as whether the impact of each individual’s single-brain activation on the corresponding group’s GNS was regulated by their brain activation connectivity.

      “Finally, we employed correlation and mediation analyses to assess if brain activation connectivity could explain the connection between individuals’ single-brain activation and the related group’s GNS. We examined the connection between the similarity in individual-collective performance and the correlation of brain activation, as well as whether the impact of each individual’s single-brain activation on the corresponding group’s GNS was regulated by their brain activation connectivity. We utilized the PROCESS tool in SPSS to investigate the proposed moderation effect. Specifically, we applied Model 1 with 5000 bootstrap resamples to examine the interaction between the independent variable (i.e., single-brain activation) and the moderator (i.e., brain activation connectivity) in predicting the dependent variable (i.e., GNS). It is noteworthy that prior to analysis, all variables in the moderation model were mean-centered to reduce multicollinearity and improve the interpretability of interaction terms.”  p.13-14

      “Building on the above results, we have developed a two-in-one neural model that explains how group identification influences collective performance. This descriptive model aims to illustrate the potential interrelationships among these indicators and establish a conceptual framework to inspire forthcoming research endeavors.”  p.21

      The details of the experiment are not described at all. While I can somewhat grasp what was done abstractly, the lack of specific information makes it impossible to replicate the study.

      As suggested, we have clarified the details of the experiment in the manuscript.

      (1) As stated in the public review, the details of the experiment are not described at all and while I can somewhat grasp what was done abstractly, the lack of specific information makes it impossible to replicate the study. In points a-e below, I list the aspects that I could not fully understand, but I am not asking for direct answers to these points. Instead, please provide a detailed description of the experiment so that it can be replicated.

      Thank you for your suggestion; we have responded to each question sequentially and elaborated on the experiment specifics to ensure replicability.

      (a) Please provide more detailed information about the Group Identification Task. How much did each participant speak (was there any asymmetry in the amount of speaking, and was there any possibility that the asymmetry influenced the identification rating)? Did the three participants interact in person, or online? Are they isolated from experimenters? How was the rating conducted, what I mean is that it's a PC-based rating?

      We apologize for the lack of detail in our description of the procedures for the experiment.

      For the first question, we draw upon previous studies concerning the manipulation of group identity while controlling the content of pre-task conversations. Specifically, the high-identity group engaged in self-introductions and identified similarities among the three members, whereas the low-identity group discussed topics related to the current semester's classes (Xie et al., 2023; Yang et al., 2020). Both discussions were conducted for the same duration of three minutes, ensuring that the number of exchanges between the two groups remained comparable. There was almost no asymmetry in the amount of speaking. We also conducted a manipulation check, which confirmed the effectiveness of our identity manipulation(pp.5-6).

      Xie, E., Li, K., Gu, R., Zhang, D., & Li, X. (2023). Verbal information exchange enhances collective performance through increasing group identification. NeuroImage, 279, 120339.

      Yang, J., Zhang, H., Ni, J., De Dreu, C. K., & Ma, Y. (2020). Within-group synchronization in the prefrontal cortex associates with intergroup conflict. Nature neuroscience, 23(6), 754-760.

      “Both discussions were conducted for the same duration of three minutes, ensuring that the number of exchanges between the two groups remained comparable.”  p.5-6

      For the second question,the three participants interacted offline in a face-to-face setting, while the experimenter remained outside the laboratory (p.6).

      “The three participants conducted face-to-face offline interaction throughout the manipulation process.” p.6

      For the third question, at the beginning of the experimental task, participants were isolated from the experimenters (p.6).

      “In addition to explaining the next phase of the task and controlling the timer, experimenters would be isolated from participants.” p.6

      For the last question, the rating of group identification was conducted through a questionnaire presented on participants’ phones (p.6).

      “The questionnaire was presented on participants’ phones.” p.6

      (b) The procedures of the Main Task are also unclear. For the Reading Information (5 min): How was the information presented? PC-based or paper-based? How were the participants seated? Did they read it independently?

      We apologize for the missing details. We have included the following information in the article.

      For the first and last question, each participant would get a piece of paper, which presents the common information and private information. They read independently. (p.6)

      “Each participant would get a piece of paper, which presented the information. Participants could read independently.” p.6

      About how the participants sat, the three participants sat around a table without partitions between each other. Only in the discussion stage, they could communicate face-to-face (p.6).

      “They sat around a table without partitions between each other.” p.6

      “In this process of discussion, the participants were able to communicate face-to-face and verbally.” p.6

      (c) For Sharing Private Information: The authors stated they share text messages using Tencent Meeting. If so, how and with what devices? How was the information displayed on the screen? Were the participants even in the same room?

      Thank you for your reminder. We have added more details now (p.6). Firstly, the experimenter sent the Tencent Meeting link to the participants. After the participants entered the meeting through their mobile phones, they could text the information they wanted to share in the chat box of the meeting. They were in the same room, with Tencent Meeting recording shared information, the participants could view them at any time.

      “During the group sharing, participants entered Tencent Meeting via their mobile phones and were able to text their private information in the chat box to their group members for 5 minutes.” p.6

      (d) For Discussing Information: It's a verbal interaction. How did they interact with others? What is the distance between them? I found a very small picture in Figure 8, but that is all information about experiment settings, that is provided by the authors.

      We are sorry about the missing details. As we have explained in the article it’s a verbal communication, so participants could talk face to face in one room. We have included the following information in the article (p.6).

      “Participants were sitting and communicating around a table. The distance between adjacent participants was about 15 cm, and the distance between face-to-face participants was about 40 cm. In this process of discussion, the participants were able to communicate face-to-face and verbally.” p.6

      (e) For the Decision Process (5 min): How did they answer (What I mean is verbally, writing, or computer-based input), and how did the experimenters record these answers?

      The questions were presented on paper, so the participants could write down their answers and experimenters could count the answers on paper. We have included the following information in the article(p.7).

      “After discussion, all triads were given 5 minutes to answer the following questions (i) the probability of three suspects, 0%-100% for each suspect; (ii) the motivation and tool of crime; and (iii) deduced the entire process of crime. The three questions were presented on paper, allowing participants to write their answers directly on the same sheet. Subsequently, three independent raters used these paper questionnaires to record and calculate the scores for each group.” p.7

      (2) I find the model presented in Figure 7 to be intriguing. Understanding why inter-brain synchronization occurs and how it is supported by specific single-brain activations or intra-brain functional connectivity is indeed a critical area for researchers conducting hyperscanning studies to explore. However, the content depicted in this model is not based on the results of this study. This is because the study did not investigate the causal relationships among the three metrics. I guess, Figure 5D might be intended to explain this, but the details of the analysis are not provided, making it unclear what is being presented. Please include a detailed explanation.

      The specific answers are available on page 5 of our response letter.

      (3) The analysis of single-brain activation analysis (and probably other analyses) focuses on the period from reading to making decisions (L237). Why was this entire interval chosen for analysis? Reading does not involve social interaction. As mentioned in a previous comment, the details of the tasks are unclear, so it's difficult to understand what was actually done in the reading period. Anyway, why were these different phases combined as the focus of analysis? Please clarify the reasoning behind this choice.

      Thank you for your feedback. The decision to analyze the entire interval, spanning from reading to decision-making, was primarily made to grasp the continuum of information processing comprehensively. While reading itself lacks social interaction, it serves as the foundation for subsequent decision-making, during which participants' cognitive states and affective responses gradually evolve. Therefore, examining these two phases collectively enables a more thorough investigation into how information influences decision-making. Furthermore, considering the task details remain ambiguous, we aim to uncover the underlying cognitive and affective mechanisms through a holistic analysis.

      (4) The method for analyzing single-brain activation is unclear. Please provide a detailed description of the analysis methods.

      Thank you for your suggestion, we have added more details in the Method section (p.11).

      “In the GLM model analysis, HbO was the dependent variable, and the regression amount was set to different task stages (a. Reading information, b. Sharing private information, c. Discussion information, d. Decision). After that, we convolved the regression factor with the Hemodynamic Response Function (HRF), and obtained the brain activation β value of each participant in each channel at different task stages through regression analysis.”  p.11

      (5) In the periods of Reading Information and Sharing Private Information, there appears to be no social interaction between participants (Figure1D). However, Figure 6 shows an increase in brain activity correlation even during the first 10 minutes (it corresponds to the Reading and Sharing period). Why does inter-brain correlation (GNS, in this study) increase even though there is no interaction between participants? Please provide an explanation.

      Sharing private information fosters interactive engagement, necessitating its exchange during Tencent Meetings to facilitate sharing. Previous research suggests that heightened correlations in brain activity can be attributed to (1) intrinsic cognitive processes, wherein participants display similar cognitive and emotional responses, fostering shared cognitive processing and brain activity synchronization despite limited external interaction; (2) emotional connections, as divulging private information elicits emotional responses that can be neurally correlated among individuals; and (3) environmental influences, where shared environments and contexts prompt neural interaction among participants even in the absence of direct social engagement. These factors collectively contribute to increased brain activity correlations without active interaction. Our primary focus, however, lies in the phase characterized by significant synchronized brain activity.

      Minor Comments:

      (6) Equation 1 Explanation: There is no explanation of Equation 1. It mentions Yi as the collective score, but what constitutes the collective score Yi is not defined in the manuscript. Additionally, while "i" is referred to as an item (in Line 196), the meaning of "item" is not clear. Therefore, the meaning of this equation is not understood.

      We apologize for this confusion. We have added a description in the manuscript (p.9).

      “In Eq.1, x is the individual score, y is the collective score (y is calculated from the three per capita scores), and i stands for the group number for the item. So, x_i means the individual score of participants in the _i group, and y_i means the collective score of the _i group. _d (x, y) r_epresents the distance from the individual to the collective score.”  p.9

      (7) Equation 2 Explanation: There is no explanation for Equation 2. Please provide descriptions for all variables such as S, t, and w.

      We have clearly stated the meaning of s, t, and w in the first edition of the manuscript article (p.12).

      As shown in L291-293: Here, t denotes the time, s denotes the wavelet scale, 〈⋅〉 represents a smoothing operation in time, and W is the continuous wavelet transform (Grinsted, Moore, & Jevrejeva, 2004).

      (8) Acronyms: Please define all acronyms upon their first appearance (e.g., CFI, TLI, RMSEA in L380).

      We apologize for these mistakes, and we have added full explanations for abbreviations upon their first use (p.16).

      “The mediation model demonstrated a satisfactory fit (CFI = 0.93, TLI = 0.93, RMSEA = 0.04) (CFI-Comparative Fit Index; TLI-Tucker-Lewis index; RMSEA-Root-Mean-Square Error of Approximation), suggesting that the perceived group identification of each individual affected the alterations in single-brain activations in the DLPFC, consequently leading to variations in their performance (β<sub>a</sub> = 0.16, t = 2.20, p = 0.030; β<sub>b</sub> = 0.26, t = 3.56, p < 0.001; β<sub>c</sub> = 0.18, t = 2.34, p = 0.020) (Figure 3C).”  p.16

      (9) Hyperscanning fMRI Studies: Since there are hyperscanning fMRI studies analyzing communication among three people (e.g., Xie et al., 2020, PNAS), it would be beneficial to cite this research. pnas.org/doi/pdf/10.1073/pnas.1917407117.

      As suggested, we have cited this paper. (p.4)

      (10) Line 272; Line 275: Should these references be to Benjamini & Hochberg (1995)?

      As suggested, we have revised our citation.

      (11) Research Objectives: The authors' aim seems to be understanding the relationship between Group Identification Level (High or Low), collective performance, and inter-brain synchronization (GNS). If so, shouldn't the results shown in Figure 6 illustrate how these differ between High and Low groups?

      We are grateful to the reviewer for your insightful comment. This study aimed to investigate the impact of group identity levels on collective performance and interbrain synchronization. Our analysis primarily focused on inter-group disparities to elucidate the potential influence of varying levels of group identification on collective behavior and neural synchrony, as highlighted by the reviewer. It is important to note that the relationship between group identification levels and collective performance, as well as neural synchronization, may represent a continuous or correlational process, rather than a binary comparison between two distinct groups. Notably, we treated group identification as a continuous variable and, consequently, Figure 6 was designed to illustrate trends in the association between group identification levels and both collective performance and neural synchronization, without conducting significance tests between groups. We are confident that the depiction in Figure 6 effectively captures the evolving dynamics between group identification levels and both collective performance and neural synchronization.

      (12) Figure 6 Star-Marker: What is the star marker shown in Figure 6? Please provide an explanation.

      We apologize for this confusion. We have added this explanation to the article. (p.21)

      “The red star sign indicates that at this time point, the neural signal began to increase significantly.” p.21

      (13) Pearson's Correlation: Use "Pearson's correlation" instead of "Pearson correlation."

      Thanks for your comments, we've changed Pearson correlation to Pearson's Correlation for a total of 10 places in the original text (pp. 9,11,13, 15,16, 19,23).

      “Moreover, the Pearson’s correlation was used to examine the relationship between group identification_2 and collective performance.” p.9

      “Subsequently, we used Pearson’s correlation analyses to investigate the relationship between single-brain activation and individual performance.” p.11

      “Second, the Pearson’s correlation between GNS and collective performance was performed.” p.13

      “Following that, we analyzed Pearson’s correlations between the original HbO data in the region related to individual and collective performance, denoted as brain activation connectivity (Lu et al., 2010).” p.13

      “Subsequently, the Pearson’s correlation between the quality of information exchange and collective performance was assessed.” p.15

      “Furthermore, the results of the Pearson’s correlation indicated that groups with higher group identification were more likely to exhibit better collective performance (r \= 0.38, p \= 0.003) (Figure 2B).” p.15

      “The Pearson’s correlation and its associated analyses were based on the data from group identification_2. *p < 0.05.” p.16

      “We first extracted the HbO brain activities related to individual performance (e.g., DLPFC, CH4) and collective performance (e.g., OFC, CH21) of each group member and conducted a Pearson’s correlation between the two.” p.19

      “Subsequently, Pearson’s correlation was used to test whether individual differences in the similarity in individual-collective performance were reflected by DLPFC-OFC connectivity.” p.19

      “Pearson’s correlation showed that the higher quality of information exchange, the better collective performance (r \= 0.36, p \= 0.007) (Figure 8C).” p.23

      (14) MNI Coordinates: The MNI coordinates for each channel are listed in the supporting information. How were these coordinates measured? Were they consistent for all participants? Was MRI conducted for each participant to obtain these coordinates?

      Thank you for your reminder, we have included the necessary instructions in the revised version. First, we need to clarify that we referred to previous literature to determine the placement of the optical probe plates. Following the completion of data collection, we utilized the Vpen positioning system to accurately locate the detection light poles, ultimately obtaining the MNI positioning coordinates. These coordinates were basically consistent for each participant. (p.8)

      “For each participant, one 3 × 5 optode probe set (8 emitters and 7 detectors forming 22 measurement points with 3 cm optode separation, see Table S1 for detailed MNI coordinates) was placed over the prefrontal cortex (reference optode is placed at Fpz, following the international 10-20 system for positioning). The other 2 × 4 probe set (4 emitters and 4 detectors forming 10 measurement points with 3 cm optode separation, see Table S2 for detailed MNI coordinates) was placed over the left TPJ (reference optode is placed at T3, following the international 10-20 system for positioning). The probe sets were examined and adjusted to ensure consistency of the positions across the participants. After the completion of data collection, we utilized the Vpen positioning system to accurately locate the detection light poles, ultimately obtaining the MNI positioning coordinates.”  p.8

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Reviews:

      Summary

      This manuscript explores the transcriptomic identities of olfactory ensheathing cells (OECs), glial cells that support life-long axonal growth in olfactory neurons, as they relate to spinal cord injury repair. The authors show that transplantation of cultured, immunopurified rodent OECs at a spinal cord injury site can promote injury-bridging axonal regrowth. They then characterize these OECs using single-cell RNA sequencing, identifying five subtypes and proposing functional roles that include regeneration, wound healing, and cell-cell communication. They identify one progenitor OEC subpopulation and also report several other functionally relevant findings, notably, that OEC marker genes contain mixtures of other glial cell type markers (such as for Schwann cells and astrocytes), and that these cultured OECs produce and secrete Reelin, a regrowth-promoting protein that has been disputed as a gene product of OECs.

      Strengths

      This manuscript offers an extensive, cell-level characterization of OECs, supporting their potential therapeutic value for spinal cord injury and suggesting potential underlying repair mechanisms. The authors use various approaches to validate their findings, providing interesting images that show the overlap between sprouting axons and transplanted OECs, and showing that OEC marker genes identified using single-cell RNA sequencing are present in vivo, in both olfactory bulb tissue and spinal cord after OEC transplantation.

      Challenges

      Despite the breadth of information presented, and although many of the suggestions in the initial review were addressed well, some points related to quantification and discussion of sex differences are not fully addressed in this revision.

      (1) The request for quantification of OEC bridges is not fully addressed. We note that this revision includes the following statement (page 6): "We note, however, that such bridge formation is rare following a severe spinal cord injury in adult mammals." However, the title of the paper states that olfactory ensheathing cells promote neural repair and the abstract states that "OECs transplanted near the injury site modify the inhibitory glial scar and facilitate axon regeneration past the scar border and into the lesion." Statements such as these make it more crucial to include quantification of OEC bridges, because if single images are shown of remarkable, unusual bridges, but only one sentence acknowledges the low frequency of this occurrence, then this information taken together might present the wrong takeaway to readers.

      Including some sort of quantification of bridging, whether it be the number of rats exhibiting bridges, the percentage area of OECs near a lesion site, or some other meaningful analysis, would add rigor and clarity to the manuscript.

      The short answer to the OEC bridges quantification is that in our last 2 studies combined, we observed bridges in 3/13 OB-OEC-transplanted rats versus 0/16 control rats (p=0.042 by two-sample proportion test; Thornton et al., 2018, Dixie, 2019). In addition to the new data on bridge formation shown in the current manuscript, our previous and most impressive data of serotonergic axons (5-HT-labeled, red) that crossed the entire lesion site is shown below (from Thornton et al., 2018). The image together with Supplemental video 1 (https://ars.els-cdn.com/content/image/1-s2.0-S0014488618302632-mmc1.mp4) show a reconstruction of multiple sections containing serotonergic axons that bridge the injury site in one OEC-transplanted, completely transected rat (1/5 OEC vs. 0/5 fibroblast-transplanted rat). The video also shows retrogradely-labeled Pseudo-rabies virus taken up by a few scattered neurons (green dots) within and above the lesion site, additional evidence suggesting axonal regeneration.

      In addition to adding bridge quantification in the Results section, we now discuss quantified results on physiological and anatomical evidence of axon regeneration across the injury site from five of the six large spinal cord injury (SCI) studies conducted by the Phelps and Edgerton laboratories. Our studies used the most difficult SCI model, a complete, thoracic spinal cord transection in adult rats, followed by OB-OEC transplantation. This is the only model in which axon regeneration can be differentiated from axon sparing found in incomplete SCIs. An introductory paragraph now summarizes and references data generated from these studies that specifically addresses questions about how OECs modify the injury site and facilitate axonal outgrowth into and across into the lesion core. While relatively few axons cross the entire injury site to reach the caudal spinal cord, many more axons project into the injury site of OEC-transplanted rats compared to those in control rats. Quantification of axonal outgrowth into the lesion site of completely transected, OEC-transplanted rats from three previous long-term studies is now discussed in the Introduction. Based on both physiological and anatomical evidence reviewed from our previous work, we hope the editors and Reviewer agree that our previous studies have shown that OECs promote axonal outgrowth and modify the injury site.

      Page 5, Introduction:

      “Together with collaborators, we conducted six spinal cord injury studies in adult rats with a completely transected, thoracic spinal cord model followed by OB-OEC transplantation (Kubasak et al., 2008; Takeoka et al., 2011; Ziegler et al., 2011; Khankan et al., 2016; Thornton et al., 2018; Dixie, 2019). Results from five of our six studies showed physiological and anatomical evidence of axonal regeneration into and occasionally across the injury site. In 6-8-month-long studies, Takeoka et al. (2011) and Ziegler et al. (2011) reported physiological evidence of motor connectivity across the transection in OEC- but not media-transplanted rats. These experiments used transcranial electric stimulation of the motor cortex or brainstem to detect motor-evoked potentials (MEPs) with EMG electrodes in hindlimb muscles at 4- and 7-months post-transection. After 7 months, 70% of OEC-treated rats responded to stimulation with hindlimb MEPs (motor cortex, 5/20; brainstem 12/20; Takeoka et al, 2011). A complete re-transection above the original transection was carried out one month later and all MEPs in OEC-injected rats were eliminated. These results provide physiological evidence of axon conductivity across the injury site in OEC-treated rats. Additionally, three of our long-term studies evaluated anatomical axonal outgrowth of the descending serotonergic Raphespinal pathway into and through the injury site. Significantly more serotonergic-labeled axons crossed the rostral inhibitory scar border (Takeoka et al., 2011) or occupied a larger area within the injury site core (Thornton et al., 2018, Dixie, 2019) in OEC-transplanted rats than in fibroblast or media controls. In addition, significantly more neurofilament-labeled axons were found within the lesion core of OEC-transplanted versus control rats (Thornton et al., 2018, Dixie, 2019).”

      Page 7, Results: We revised the sentence below and added additional information.

      “We note, however, that such bridge formation is rare following severe spinal cord injury in adult mammals and was detected in 2 out of 8 OEC-transplanted rats and 0/11 media or fibroblast-transplanted controls in this study (Dixie, 2019). Combined with the 1/5 OEC-transplanted rats with axons crossing the injury and 0/5 fibroblast controls in our previous study (Thornton, 2018), we observed bridges in 3/13 OEC-transplanted rats vs 0/16 controls (p=0.042, two-sample proportion test). Bridge formation, in conjunction with the additional physiological and anatomical evidence of axonal connections across the injury site presented in our previous studies, strongly supports the capacity of OECs in neural repair.”

      Page 46, Figure legend 1: We added statistical data to the legend

      “Bridge formation across the injury site was observed in 2 of 8 OEC-transplanted and 0 of 11fibroblast- or media-transplanted spinal cord transected rats. Combined with the 1/5 OEC-transplanted rats with axons crossing the injury and 0/5 fibroblast controls in our previous study (Thornton, 2018), we observed bridges in 3/13 OEC-transplanted rats vs 0/16 controls (p=0.042, two-sample proportion test).”

      (2) The additional discussion of sex differences in OEC bridging elaborates on the choice to study female rats, citing bladder challenges in male rats, but does not note salient clinical implications of this choice. Men account for ~80% of spinal cord injuries and likely also have worsened urinary tract issues, so it would be important to acknowledge this clinical fact and consider including males in future studies.

      Response: We agree that studying SCI repair in male rodents is very important as most people with these injuries are male. We did find one publication by Walker et al. (2019, Journal of Neurotrauma 36:1974-1984) that looked at sex differences in aged-matched male and female rats after a moderate contusion SCI. They examined a number of histological and functional features, and did not find many differences between the genders. Compared to studies of moderate SCI, studies using a completely transected spinal cord model must carry out manual bladder expressions a minimum of twice a day throughout the entire 5 to 7-month study in order to maintain kidney health. Because male urethras are much longer than those of females, males are much more likely that females to die from kidney disease during a complicated, long-term studies such as ours. Fortunately, most SCIs in humans are contusions rather than complete transections so an incomplete contusion model is most appropriate for studying sex differences. We modified the previous statement in our Discussion section as below.

      Page 25, Discussion

      “We acknowledge that in humans, males account for ~80% of spinal cord injuries (National Spinal Cord Injury Statistical Center, 2024) and sustain more serious urinary tract issues than females. We examined females in the current study due to practical experimental considerations, but it is necessary to examine males in future studies.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) It is strongly recommended that some sort of quantification of bridging be included in the figures or in a table, whether this is the number of rats showing bridges, the percent area of OECs near the lesion site, or some other meaningful analysis.

      As discussed in the response in Challenge section (1) above, we observed bridges in 3/13 OEC transplanted rats vs 0/16 controls across our two most recent studies. In addition, we added evidence of physiological and anatomical axonal connections across the injury site from our previous studies. We have added the additional information in the Introduction, Results, and Figure legend 1.

      (2) It is recommended that clinical sex differences in spinal cord injury (with ~80% occurring in men) be acknowledged in the Discussion. This clinical fact could be directly mentioned without much justification.

      See Challenge (2) above and addition to the Discussion on page 25.

      (3) Figs. 1, 5, 6: There is still no quantification included for these figures, which detracts from the ability of readers to understand the context and importance of these results. It is recommended to include quantification for these figures.

      Response regarding quantification associated with Figures 1, 5 and 6:

      Regarding Figure 1: We have discussed the additions to the text of the Introduction, Results and the legend of Figure 1 in detail on pages 2-3 of this response. These are important new additions to our paper.

      Regarding Figure 5: We added quantitative information regarding the analysis of Connective Tissue Growth Factor (Ctgf) expression in the injury site.

      Page 10-11, Results:

      “We found high levels of Ctgf expression in GFP-OECs (n=4 rats) that bridged much of the injury site and also detected Ctgf on near-by cells (Figure 5d, d1-2). GFP-labeled fibroblast transplantations (n=3 rats) served as controls and also expressed Ctgf.”

      Page 36, Methods:

      “To examine Ctgf expression in the spinal cord lesion site, we processed 1 slide per animal with ~6 equally-spaced sagittal sections throughout spinal cord from the Khankan et al. (2016) study. Our aim was to assess if transplanted OECs (n=4 rats) and transplanted fibroblasts (n=3 rats) express CTGF in the injury site.”

      Regarding Figure 6: The statistics for Figure 6 are found on page 13 of the Results section and page 38 of the Methods section. We now added the statistics to the Figure 6 legend on page 49.

      Page 13, Results:

      “To determine if the proliferative OECs differ in appearance from adult OECs, and whether there is concordance between our OEC subtypes based on gene expression markers and previously described morphology-based OEC subtyping (Franceschini & Barnett, 1996), we analyzed OECs identified with the anti-Ki67 nuclear marker and anti- Ngfr<sup>p75</sup> (Figure 6g-h). Of the Ki67-positive OECs in our cultures, 24% ± 8% were strongly Ngfr<sup>p75</sup>-positive and spindle-shaped, whereas 76% ± 8% were flat and weakly Ngfr<sup>p75</sup>-labeled (n=4 cultures, p\= 0.023). Here we show that a large percentage (~3/4<sup>ths</sup>) of proliferative OECs are characterized by large, flat morphology and weak Ngfr<sup>p75</sup> expression resembling the previously described morphology-based astrocyte-like subtype. Our results indicate the two types of OEC classifications share certain degrees of overlap, indicating similarities but also differences between the two classification methods.”

      Page 38, Methods: Morphological analyses of Ki67 OEC subtypes

      “To determine if OEC progenitor cells marked with Ki67 immunoreactivity have a distinctive morphology, purified and fixed OEC cultures from 4 rats were processed with anti- Ngfr<sup>p75</sup>, anti-Ki67 and counterstained with Hoechst (Bis-benzimide, 1:500, Sigma-Aldrich, #B2261). Images were acquired from 7-10 randomly selected fields/sample using an Olympus AX70 microscope and Zen image processing and analysis software (Carl Zeiss). We distinguished the larger, flat ‘astrocyte-like’ OECs from the smaller, fusiform ‘Schwann cell-like’ OECs, and recorded their expression of Ngfr<sup>p75</sup> and Ki67. Cell counts from each field were averaged per rat and then averaged into a group mean ± SEM. A Student t-test was conducted to compare the effect of Ngfr<sup>p75</sup>-labeled cell morphology and the proliferative marker Ki67. Statistical significance was determined by p < 0.05.”

      Page 49, Figure 6 legend:

      “Of the OEC progenitors that express Ki67, 76% ± 8 of them display low levels of Ngfr<sup>p75</sup> immunoreactivity and a “flat” morphology (g2, h2; green nuclei, arrowheads). The remainder of Ki67-expressing OECs express high levels of Ngfr<sup>p75</sup> and are fusiform in shape (24% ± 8%, n=4 cultures, Student-t test, p= 0.023).”

      (4) Fig. 9: Quantification is still not included in the figure for these Western blots, although it is appreciated that the authors included some quantification in their response letter. Including this in the figure would provide clarification for the reader.

      Thank you for your suggestion. We now add the quantification to figure 9, together with the methods used for western blot quantification and the figure legend.

      Page 32, Methods:

      “For quantification, ImageJ software (NIH) was used to analyze the densitometric data. Western blot images at 400, 300, and 150 kDa resolution were converted to grayscale followed by manually defining a Region of Interest (ROI) frame that captured the entire band in each lane using the "Rectangular" tool. The area of each selected band was measured by employing the same ROI frame around the band to record the integrated density, “Grey Mean Value”. Background measurements were similarly quantified, and background subtraction was performed by deducting the inverted background from the inverted band value. For relative quantification, target protein bands were normalized to the corresponding loading control (GAPDH) to derive normalized protein expression (fold change). Band intensities were quantified in triplicate for each sample. Data were analyzed with the Mann-Whitney U test to compare normalized protein expression between the Reln<sup>-/-</sup> group and the other groups. A one-sided p-value was calculated to test the hypothesis that protein expression levels in the other groups are greater than those in the Reln<sup>-/-</sup> group (negative control). Statistical significance was determined at p < 0.05. Analysis was performed using GraphPad Prism (version 9).”

      Page 52, Figure legend 9f:

      “(f) Quantitation of multiple isoforms of Reelin from 4-15% gradient gels. Positive and negative controls are Reln<sup>+/+</sup> and Reln<sup>-/-</sup> mouse cortices. Both rat tissue from the ONL (n=3) and CM (n=9) contain more 400 and 300 kDa Reelin compared to the Reln<sup>-/-</sup> mouse. Bars represent the standard deviation of the mean. One-sided Mann-Whitney U test was used to test that protein expression levels in the other groups are greater than those in the Reln<sup>-/-</sup> group, indicative of significant expression of Reln in the test groups. *p < 0.05.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      The interpretation of results obtained with opto-Treacle (related to Figure 2C) may be expanded.

      We thank the reviewer for their insightful comment regarding the interpretation of the results obtained with opto-Treacle. We understand the concern that the difference in the size of the condensates formed by opto-Treacle (Figure 2C) compared to Treacle-2S or other constructs may raise questions about the role of tetramerization in driving condensate formation, as 2S is known to tetramerize while FusionRed is not susceptible to multimerization.

      To address this concern, we emphasize that we have demonstrated that overexpressed Treacle forms large condensates even in the absence of any fluorescent protein, as included in the revised manuscript. This observation supports the conclusion that Treacle's ability to form condensates is intrinsic and does not depend on the multimerization capacity of the fluorescent tag.

      We believe that the observed difference in condensate size between opto-Treacle and Treacle-2S, Treacle-GFP, or untagged Treacle arises primarily from the time available for condensate assembly. Opto-Treacle condensation occurs rapidly, within approximately 10 seconds of blue light illumination, whereas Treacle-2S, Treacle-GFP, or untagged Treacle undergo condensation over the extended period of 24–48 hours of protein overexpression. This temporal difference likely accounts for the disparity in condensate size, as longer assembly times allow for larger and more mature condensates to form.

      Given this reasoning, we consider it unnecessary to further emphasize the size differences in the main text of the article, as we believe the underlying explanation is clear and supported by the data. Nonetheless, we are open to incorporating additional clarifications if the reviewer deems it necessary.

      The authors might reconsider referring to Treacle as a scaffold. Ultimately, the scaffold for the nucleolus is the rDNA with its bound proteins. Scaffold proteins, by definition, bind multiple protein partners and facilitate the formation of multiprotein complexes, a role not really attributed to homotypic LLPS.

      We thank the reviewer for raising this important point regarding the use of the term "scaffold" in relation to Treacle. We fully acknowledge that rDNA, along with its associated protein complexes, serves as the primary structural scaffold for the nucleolus. However, we believe that referring to Treacle as a scaffold is appropriate and justified within the specific context of our study.

      First, we emphasize that we describe Treacle as a scaffold specifically for nucleolar fibrillar centers (FCs), rather than for the nucleolus as a whole. This distinction is important, as our work focuses on the role of Treacle in organizing FC components, rather than the broader structural organization of the nucleolus.

      Second, as the reviewer notes, scaffold proteins are defined by their ability to bind multiple protein partners and facilitate the formation of multiprotein complexes. Our findings demonstrate that Treacle's condensation properties promote the binding and retention of key rDNA-associated protein partners, including RPA194, UBF, and Fibrillarin, within the FCs. This activity aligns with the functional definition of a scaffold protein, as Treacle supports the spatial organization and cooperative interactions of FC components essential for rRNA transcription and processing. Therefore, while we appreciate the reviewer's observation regarding the central role of rDNA as a nucleolar scaffold, we maintain that the use of the term "scaffold" to describe Treacle's role in organizing FCs is consistent with its demonstrated functional properties.

      If authors decide to add the "Ideas and Speculation" subsection to their Discussion, it may be interesting to discuss the following outstanding questions: does Treacle undergo homotypic or heterotypic LLPS? Does its overexpression favor homotypic interactions? How does it segregate FC and DFC compartments -by exclusion? How does phase-separated Treacle interact with other proteins?

      We thank the reviewer for these insightful questions. While we believe that adding a dedicated "Ideas and Speculation" subsection would be redundant, we have already addressed the questions regarding Treacle’s homotypic or heterotypic LLPS and its interactions with other proteins in the revised "Discussion" section. Additionally, we have included a new section in the manuscript specifically focused on investigating the role of Treacle condensation in its interactions with protein partners, further expanding on these points.

      In Materials and Methods, smFISH section -"probes were designed as described (Yao et al, 2019) and labeled with FITS on the 3'ends" - was it meant to say FITC (i.e. Fluorescein)?

      We thank the reviewer for catching this error. This was indeed a typo, and we have corrected it to "FITC (i.e., Fluorescein)" in the revised text.

      Reviewer #2 (Recommendations for the Authors):

      Regarding recombinant Treacle, the main concern is that the authors may not be observing the condensation of Treacle itself. The quality of the purchased recombinant Treacle is unclear (this reviewer could not find Treacle listed on the vendor website despite using the supplied catalog number or vapors search terms). Furthermore, it is not clear if the condensates observed are Treacle or potentially the Dextran crowder. Only small percentages (>1%-5%) of either Dextran or PEG are needed to induce phase separation in two-component mixtures of these polymers. PEG may be in the Treacle storage butter. In addition to clarifying the State of recombinant Treacle, these concerns could be further assuaged by direct visualizing of Treacle forming condensates (via fluorescent n-terminal tagging) and filling in more of the phase space to observe the loss of condensates at a threshold concentration of Treacle. In general, the gold standard for establishing condensation of a given protein is mapping the full binodal phase diagram diagram of the protein. Understanding that protein is a limited resource, most groups simply map the lower concentration arm of the binodal, and this is sufficient to characterize a protein as having intrinsic condensation behavior. A similar mapping effort of Treacle would be welcomed. 

      We thank the reviewer for their thoughtful comments and for highlighting concerns regarding the interpretation of our experiments with commercial recombinant Treacle. We recognize the importance of ensuring that the observed condensation properties are intrinsic to Treacle and not influenced by potential contaminants, storage buffer components, or tags on the protein.

      To address these concerns, we have re-evaluated the condensation properties of Treacle using a recombinant fragment independently purified in our laboratory. Specifically, we expressed and purified a Treacle fragment (amino acids 291–426), which includes two S/E-rich low-complexity regions (LCRs) and two linker regions, in E. coli. The protein was expressed as a TEV-cleavable maltose-binding protein (MBP) fusion, purified under native conditions via amylose resin, and subjected to TEV cleavage. This was followed by ion-exchange chromatography and extensive dialysis to remove any remaining impurities. These additional steps ensured that the purified Treacle fragment was of high purity and free from confounding components, such as polyethylene glycol (PEG). We have included detailed descriptions of this protocol in the revised manuscript.

      Using this purified Treacle fragment, we confirmed its intrinsic condensation behavior in vitro. In the presence of 5% PEG8000 as a crowding agent, the fragment formed liquid-like condensates that exhibited spherical morphology and dynamic fusion events, key hallmarks of liquid-liquid phase separation (LLPS). Additionally, we demonstrated that the condensation of this Treacle fragment was sensitive to changes in pH and salt concentration but unaffected by 1,6-hexanediol treatment, suggesting that the condensates are stabilized predominantly by electrostatic interactions (Fig. 4B of the revised manuscript). Importantly, these findings provide robust evidence that Treacle possesses intrinsic phase-separation properties. All results from the commercial Treacle protein used in the initial version of the manuscript have been replaced with data obtained using this independently purified recombinant fragment.

      We undestand that the condensation behavior of the fragment may not fully capture the behavior of full-length Treacle. Nevertheless, the in vitro experiments provide valuable mechanistic insights into the biophysical properties of Treacle. Furthermore, as emphasized in the revised manuscript, our study primarily focuses on understanding the condensation and functional role of Treacle in a cellular context, where we observe its critical involvement in organizing nucleolar structure and regulating rRNA transcription. These cellular experiments highlight the biological relevance of Treacle’s condensation behavior.

      With regard to mapping the binodal phase diagram of Treacle, we concur with the reviewer that such an effort would be ideal for a more comprehensive characterization of Treacle’s condensation properties. However, the limited availability of purified protein currently precludes a detailed mapping effort. Despite this limitation, we believe the qualitative assessments of Treacle’s condensation under varying conditions, now included in the revised manuscript, sufficiently demonstrate its intrinsic ability to phase-separate.

      In conclusion, we are grateful for the reviewer’s feedback, which has allowed us to refine our methodology and strengthen the evidence supporting the intrinsic condensation properties of Treacle. We are confident that the revised manuscript provides a robust and thorough characterization of Treacle’s phase-separation behavior and its functional role in the cell, addressing the reviewer’s concerns. Thank you for your constructive recommendations, which have significantly improved the quality of our work.

      Replacing 'liquid-phase' and 'liquid' with 'liquid-like' would make the language consistent with other papers in the field and more accurately reflect the degree of material state analysis carried out in the study.

      We thank the reviewer for this insightful recommendation. In response to the suggestion, we have revised the manuscript to replace the terms "liquid-phase" and "liquid" with "liquid-like" throughout the text. This change ensures consistency with terminology commonly used in the field and more accurately reflects the degree of material state analysis performed in our study. We believe this adjustment improves the clarity and precision of our findings, aligning the manuscript with standard practices in the field. Thank you for helping us enhance the quality of the presentation.

      The 'unclear' nature of the condensation behavior of the FC phase of the nucleolus is listed as a motivation for carrying out the study in the introduction; the authors could note here two recent papers that have investigated the nature of FC condensation: Jaberi-Lashkari et al. 2023 and King et al. 2024. The reviewer notes that while these were both pre-printed in late 2022, they were only recently published.

      We thank the reviewer for bringing these recent studies to our attention. In response to the suggestion, we have cited the papers by Jaberi-Lashkari et al. (2023) and King et al. (2024) in both the introduction and discussion sections of the revised manuscript. These references are highly relevant to the context of our study and provide valuable insights into the condensation behavior of the FC phase of the nucleolus. We agree that incorporating these works strengthens the framing of our study and situates it more effectively within the broader field. Thank you for this constructive recommendation.

      The statement that Treacle is "the main molecule present in the FC" is a substantial claim that does not need to be made to promote the author's case, nor is it well supported by the provided reference (Gal et al., 2022).

      We thank the reviewer for pointing out this overstatement in our original manuscript. In response, we have revised the text to provide a more accurate and well-supported description. Specifically, we have replaced the claim that Treacle is "the main molecule present in the FC" with a statement highlighting its direct interactions with UBF and RNA Pol I, as well as its colocalization with these proteins within the FC. This revision ensures alignment with the provided references and more accurately reflects the current understanding of Treacle's role in the FC. We appreciate the reviewer's attention to this detail, which has helped us improve the clarity and accuracy of our manuscript.

      The statement that "Treacle is one of the most intrinsically disordered proteins" is vague and unnecessarily grand. Treacle is a fully intrinsically disordered protein; these comprise 5% of the human proteome (Tsang et al. 2020), so Treacle is, indeed, unusual in that regard.

      We thank the reviewer for highlighting the vague and unnecessarily broad nature of the original statement. In response, we have revised the text to provide a more precise and accurate description of Treacle's structural properties. Specifically, we replaced the claim that "Treacle is one of the most intrinsically disordered proteins" with the statement that "According to protein structure predictors (e.g., AlphaFold, IUPred2, PONDR, and FuzDrop), Treacle is a fully intrinsically disordered protein." This wording reflects the unique nature of Treacle while remaining scientifically accurate and supported by reliable computational predictions. We appreciate the reviewer's feedback, which has allowed us to improve the rigor and clarity of our manuscript.

      A comment on the implications of the immobile pool of Treacle (which appears to be ~50% in WT and across a range of mutants) would be welcome. Additionally, the limitations of FRAP for interrogating material properties of condensed material in living systems are provided in Goetz and Mahamid, 2020. In this paper, the authors review instances where the ultrastructure of condensate is known and where FRAP data is available. They show that crystalline assemblies can recover faster than apparently liquid, spherical assemblies. A comment in the text about how these limitations apply to this study would be welcome.

      We appreciate the reviewer’s insightful comments regarding the interpretation of the immobile pool of Treacle and the limitations of FRAP for characterizing material properties in living systems. As noted in our response to the public review, we believe the ~50% recovery rate after photobleaching observed in our experiments is best explained by the redistribution of Treacle molecules within the condensate, rather than significant exchange with the surrounding phase. This interpretation is strongly supported by the full- and half-FRAP analyses included in the revised manuscript, which demonstrated internal mixing dynamics within the condensates.

      There appears to be a typo in the following sentence: "The highly positively charged CD serves as the nucleation center for RD but exhibits ambivalent phase properties, transitioning from LLPS to LSPS in the absence of rRNA." The LLPS to LSPS behavior was observed for mutants to the central domain (RD), not the c-terminal domain (CD).

      Throughout the authors report single snapshots of representative cells and single line traces. Analysis of the key morphological feature across the population of cells would help the reader understand how widespread the observed phenotype is.

      We thank the reviewer for raising this important point regarding the representation of morphological features across the cell population. To address this concern, we have included widefield micrographs of cell fields in the revised figures to provide a more comprehensive view of the phenotypes observed.

      The statement that "The phase behavior of polymers is determined by interactions through associative motifs, referred to as stickers, separated by spacers, which are not the primary driving forces for phase separation" could be improved by pointing out that this is potentially incomplete for describing the kind of condensation that highly charged polymers undergo. The high charge and charge segregation of Treacle suggest that it is a blocky polyampholyte and that it condenses by coacervation. Models of associative polymers can be useful for describing coacervation, however, the driving forces for coacervation are less understood and have been proposed to include an entropic component (see Sathyavageeswaran et al. 2024, Sing and Perry 2020 and work from their groups as well as the Obermayer (Columbia) and Terrell (U. Chicago) Groups).

      We thank the reviewer for highlighting this important aspect of the phase behavior of charged polymers and for suggesting relevant references. In response, we have revised the discussion section of the manuscript to include a more nuanced explanation of the condensation mechanisms for highly charged polymers such as Treacle. Specifically, we now describe Treacle as a blocky polyampholyte, suggesting that its condensation behavior may be driven by coacervation mechanisms.The relevant references have been added to the discussion section of the revised manuscript.

      In addition to the above, the authors may consider citing two recent publications from the Pappu group (King et al. Cell 2024 and King et al. Nucleus 2024) that directly investigate the condensation potential of K-rich and E/D-rich' grammars' on nucleolar proteins and show that, like the authors, the K-rich region is essential for localization and is conserved across nucleolar proteins.

      We thank the reviewer for bringing these relevant publications to our attention. The suggested references from the Pappu group (King et al., Cell 2024, and King et al., Nucleus 2024) have been added to the introduction and discussion sections of the revised manuscript, and their findings have been appropriately integrated into our analysis.

      The authors could consider replacing the use of LLPS with a more generic term such as "condensation" or "biomolecular condensation." LLPS of polymers is a segregative transition driven by its incompatibility with the surrounding solvent. As indicated, Treacle is likely to be undergoing some form of coacervation (which is predominantly an associative tradition), which can be genetically described as condensation. See Pappu et al. 2023 for more details.

      We thank the reviewer for their insightful suggestion. Following the reviewer's recommendation, we have replaced the term "LLPS" with "condensation" or "coacervation" throughout the manuscript, where appropriate. Additionally, we have referenced Pappu et al. (2023) and other to provide further context and clarity regarding the distinctions between these terms.

      The authors cite Yao et al. 2019, but do not cite the follow-up study (Wu et al. 2021) or provide a statement on how the Chan group finds a role for the RGG domain of FBL in keeping the certain canonical markers of the FC and DFC de-mixed.

      We thank the reviewer for pointing out these important references. The relevant citations, including Wu et al. (2021), have been added to the manuscript.

      Reviewer #3 (Recommendations for the Authors):

      The following comment is true but could be broadened to include examples of structured regions promoting biomolecular condensation. "In biological systems, phase separation is mainly a characteristic of multivalent or intrinsically disordered proteins (Banani et al, 2017; Shin & Brangwynne,2017; Uversky, 2019)."

      We have expanded the statement as recommended by the reviewer: "In biological systems, phase separation is facilitated by a combination of multivalent interactions mediated by intrinsically disordered proteins and site-specific interactions that drive percolation."

      Related to Figure 1.

      The authors report Treacle-dependent EU incorporation (Figure 1D), but are there any changes more broadly to nucleolar number or size as a consequence? How do the authors interpret that the quantitative effect of AMD treatment is more extreme than Treacle depletion (Figure 1E).

      We thank the reviewer for raising these important points. Regarding nucleolar number and morphology, we did not observe a change in the number of nucleoli upon Treacle depletion. However, nucleoli appeared more regularly rounded under these conditions, which we interpret as a consequence of the decreased rDNA transcription activity caused by Treacle depletion. A similar rounding of nucleoli is also observed upon actinomycin D (AMD) treatment, which is consistent with reduced transcriptional activity.

      As for the more pronounced effect of AMD compared to Treacle depletion on EU incorporation, this can be explained by the fundamentally different mechanisms through which these conditions affect transcription. Treacle depletion reduces the local concentration of transcription factors at rDNA sites, thereby impairing transcription initiation and elongation to a certain extent. However, under Treacle depletion, RNA polymerase I still retains the ability to bind to the promoter and support a residual level of transcription. In contrast, AMD acts as a potent intercalator in GC-rich regions of rDNA, physically blocking the ability of RNA polymerase I to move along rDNA, resulting in near-complete cessation of rRNA synthesis.

      Related to Figure 2.

      The authors observe that AMD leads to coalescence of individual Treacle-2S+ bodies (e.g. Figure 2E) - does this suggest that ongoing rRNA transcription is required to prevent such events?

      Thank you for your thoughtful question. Indeed, our observations strongly suggest that ongoing rRNA transcription is required to prevent the coalescence of Treacle-2S+ bodies, as observed upon AMD treatment. This interpretation aligns with the findings of Tetsuya Yamamoto et al., who demonstrated that nascent ribosomal RNA (pre-rRNA) acts as a surfactant to suppress the growth and fusion of fibrillar centers (FCs) in the nucleolus. Their work highlighted that nucleolar condensates formed via liquid-liquid phase separation (LLPS) tend to grow to minimize surface energy, provided sufficient components are available. However, the transcription of prerRNA stabilizes FCs by maintaining multiple microphases, preventing coalescence unless transcription is inhibited.

      According to Yamamoto et al., nascent pre-rRNAs tethered to FC surfaces by RNA Polymerase I generate lateral pressure that counteracts interfacial tensions, effectively suppressing FC fusion. This activity is analogous to the surfactant properties of molecules in physical systems. When transcription is inhibited (e.g., by AMD), the loss of nascent rRNA allows condensates to coalesce, consistent with the behavior we observe.

      We further propose that the AMD-induced coalescence of Treacle-2S+ bodies reflects the loss of this surfactant-like effect, as transcriptional activity ceases. This theory is also supported by the observation that Treacle condensates in the nucleoplasm, where rRNA transcription is absent, form larger structures. Collectively, these insights highlight the critical role of ongoing rRNA transcription in maintaining the structural integrity and dynamic organization of nucleolar substructures.

      Related to Figure 3.

      In the figure panels B-H the DAPI signal in gray obscures the Treacle localization, especially in Figure 3H. A non-merged image for each of these examples for the Treacle localization would be very helpful.

      We thank the reviewer for this observation. To address this, we have included wide-field images without the DAPI overlay for the deletion mutant lacking the 1121-1488 region. These are now presented in Supplementary Figure S5G of the revised manuscript.

      Related to Figure 5.

      Only a single representative nucleus is shown in the PLA analysis presented in Figure 5B.

      Quantification to assess the robustness of this response with the addition of VP16 is needed. The authors use ChIP and immunocytochemistry as orthogonal methods but it would be best to therefore show both for each manipulation that is performed - the immunostaining of TOPBP1 in the Treacle KD cells in S5A should be in the main Figure 5 to complement transformation of constructs as in Figure 5D.

      We appreciate the reviewer’s comment. To address this, we performed a quantitative analysis of PLA fluorescence signals in control and etoposide-treated cells, and the results are now presented in Supplementary Figure S8C. Additionally, as recommended, we have transferred the results of the immunocytochemistry of TOPBP1 in Treacle KD and Treacle KN cells to the main figure, now included as Figures 7D-E in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary.

      In this meticulously conducted study, the authors show that Drosophila epidermal cells can modulate escape responses to noxious mechanical stimuli. First, they show that activation of epidermal cells evokes many types of behaviors including escape responses. Subsequently, they demonstrate that most somatosensory neurons are activated by activation of epidermal cells, and that this activation has a prolonged effect on escape behavior. In vivo analyses indicate that epidermal cells are mechanosensitive and require stored-operated calcium channel Orai. Altogether, the authors conclude that epidermal cells are essential for nociceptive sensitivity and sensitization, serving as primary sensory noxious stimuli.

      Strengths.

      The manuscript is clearly written. The experiments are logical and complementary. They support the authors' main claim that epidermal cells are mechanosensitive and that epidermal mechanically evoked calcium responses require the stored-operated calcium channel Orai. Epidermal cells activate nociceptive sensory neurons as well as other somatosensory neurons in Drosophila larvae, and thereby prolong escape rolling evoked by mechanical noxious stimulation.

      Weaknesses.

      Core details are missing in the protocols, including the level of LED intensity used, which are necessary for other researchers to reproduce the experiments. For most experiments, the epidermal cells are activated for 60 s, which is long when considering that nocifensive rolling occurs on a timescale of milliseconds. It would be informative to know the shortest duration of epidermal cell activation that is sufficient for observing the behavioral phenotype (prolongation of escape behavior) and activation of sensory neurons.

      (1) We agree with the reviewer that the LED intensity is an important detail of the experimental paradigm. We updated the methods to include intensity measurements for the stimuli used throughout the manuscript.

      (2) The Reviewer asks about the shortest duration of epidermal cell activation sufficient for observing the behavior phenotype. We note in the manuscript that behavioral responses to optogenetic epidermal stimulation are apparent within 2 seconds of stimulus (see Figure 2F); this is consistent with our calcium imaging data in which C4da response reaches its maximum within 2-3 sec of stimulation.

      Reviewer #1 (Recommendations):

      (1) The epidermal cells in this study are activated for 60 s. In the real world, the nociceptive stimulation (a poke, such as penetration by the ovipositor of a parasitic wasp) that evokes escape rolling is short. Does optogenetic activation of 1 s or less still evoke rolling? For example, it is unclear in Figure 4K how long the epidermal cells need to be activated before the poke stimulus prolongs rolling. Is it possible to test behavior and GCaMP activity in sensory neurons when epidermal cells are briefly (1 second) activated?

      As described above, behavioral responses to optogenetic epidermal stimulation are apparent within 2 seconds of stimulus (see Figure 2F); this is consistent with our calcium imaging data in which C4da response reaches its maximum within 2-3 sec of stimulation. The kinetics are consistent with a role for epidermal cells in modulating neuronal responses to nocifensive stimuli, and similar to the response kinetics observed in mammalian epidermal cells that modulate neuronal touch and pain responses  (Maksimovic et al., 2014; Woo et al., 2014; Mikesell et al., 2022).

      (2) The protocol for optogenetic screening states that the authors used a 488-nm LED. Why was a 488-nm LED used instead of the 610-nm LED for Chrimson activation? No information (except figure 4K) about the light intensity is provided in the figure legend or the protocol section. Please state the LED intensity used for all optogenetic experiments (GCaMP imaging, behavioral experiments, etc.).

      We used 488 nm light for the initial screen for technical reasons. The screen was conducted by students at the MBL Neurobiology course (hence the affiliation; student authors are included in the manuscript), and the only LED available to us at that time delivered insufficient illumination at longer wavelenths to be useful. We chose to include the student’s data because (1) we found that the 488 nm light alone did not induce rolling in our setup, (2) we repeated and extended the studies with the epidermal drivers using a higher resolution imaging platform and longer wavelength stimulation (all studies other than Fig. 1), and (3) we observed qualitatively similar results when we repeated stimulation with all drivers using 561 nm light.

      We agree that the LED intensity is an important detail of the experimental paradigm. We updated the methods to include intensity measurements for the stimuli used throughout the manuscript. We also include the intensities here:

      - 30 μW/mm^2 for calcium imaging experiments Fig 3B-E, Fig 4A, Fig 3S1A-D, Fig 4S1A

      - 300 μW/mm^2 for behavior studies in Fig 2B-E, Fig 1S6, Fig 2S1, Fig 3E-F, Fig 3S2A-C

      - 25 μW/mm^2 for behavior studies in Fig 4E-J

      - 1.16 μW/mm^2 for behavior studies in Fig 4K

      (3) Lines 150 - 152: Although the authors refer to "a stereotyped behavior sequence" in Fig 2D, there are no data supporting this claim in Fig 2. Rather, the data appear to represent proportions of different types of behavior at each time point, rather than behavior sequences. If the authors wish to claim that the data show stereotyped behavior sequences, they should analyze the data using a different method (e.g., Markov models).

      We agree that in the absence of additional analysis we should avoid commenting on stereotypy of behavior sequences; we therefore adjusted the text to reflect the tendency of nociceptive behaviors to precede non-nociceptive behaviors. The raster plots shown in Supplemental Fig. 2A illustrate this point: in larvae exhibiting nociceptive behaviors, these behaviors appear first, followed by backing and frequently freezing. As one quantitative readout of this sequence we show that the latency of rolling (nociceptive) is shorter compared with backing or freezing (non-nociceptive) (Fig. 2F, Fig. S2G).

      (4) Figure 3A-E: a cursory glance at the data suggests that the most responsive sensory neurons are C1da, with all sensory neurons activated. However, at the behavioral level, only some sensory neurons are activated. If all sensory were activated by Chrimson, what behavioral phenotypes would the authors expect to see? Would it be the same as epidermal activation?

      The Reviewer raises an interesting question, but we intentionally avoid comparing the response properties among sensory neurons because of differences in driver strength. Likewise, extrapolating “activation” at the behavioral level is exceedingly difficult if/when multiple sensory neurons are simultaneously activated. In response to the Reviewer’s specific question, when all da neurons are activated simultaneously, larvae largely exhibited hunching rather than rolling (Hwang et al., 2007). We find that epidermal stimulation rarely elicits hunching; instead, epidermal stimulation generally triggers nocifensive behaviors followed by non-nocifensive behaviors such as backing and freezing, suggesting an order or priority in neurons activated by epidermal cells (or different response times). Defining the mechanisms by which epidermal cells communicate with different types of sensory neurons is therefore a top priority for future studies.

      (5) Figure 3S2; The behavior phenotypes between Fig. 3E, F and Fig 3S2 seems a slightly different. I suggest adding some comments in different behavior phenotype depending on the different GAL4. Specifically, is there increased freezing in some genotypes (e.g., ppk-LexA or NompC-lexA)? Can you show this without TNT data? Is this a background effect or specific GAL4 phenotype?

      We currently do not have the driver-only control for this experiment, but our effector-only control experiment (see Fig. 3S2A) suggests that larvae carrying the AOP-TNT insertion exhibit enhanced nociceptive behavioral responses. This point is addressed in our manuscript by the following (copied from the figure legend):

      “We note that although baseline rolling probability is elevated in all genetic backgrounds containing the AOP-LexA-TnT insertion, silencing C4da and C3da neurons significantly attenuates responses to epidermal stimulation.”

      (6) Calcium-free solution is used in Figure 3. Why do the authors still observe calcium influx? Does this mean that internal calcium stores are released? If so, does the calcium influx represent an action potential? How do the authors focus their LED stimulation to activate epidermal cells and avoid activation of the imaging laser?

      The specimens were imaged in calcium-free solution to minimize movement artifacts. However, the CNS is wrapped by glial cells and over short timescales such as those used for the imaging we speculate that extracellular calcium persists in the CNS.

      (7) It is unclear when animals begin to crawl after the epidermal cells are mechanically stimulated. How do the authors distinguish between peristaltic crawling and a poke by Orai receptors? Although the in vitro experiments beautifully show radial tensions, it is unclear to what extent A-P axis tension (peristaltic crawling) and radial tension (poke) differ. It might be helpful to explain in the discussion section how epidermal cells are selectively activated.

      The Reviewer raises an interesting question about the types and thresholds of forces required to elicit epidermal responses. We cannot eliminate the possibility that peristaltic crawling (or crawling through a 3D substrate) stimulates epidermal cells to a certain degree. Indeed, our results demonstrate a dose-dependent response of Drosophila epidermal cells and human keratinocytes to radial stretch. However, we do not have any information about selectivity in response to different stimuli, though we agree that this is an intriguing avenue for future studies. For example, we don't know whether stretch-responsive cells are more or less responsive to poke. But, a salient feature of our studies is the recruitment of greater numbers of responders with increasing stimulus intensity, therefore we added the following statement to the discussion to clarify our model:

      “Finally, we find that epidermal cells exhibit a dose-dependent response to radial stretch; we therefore anticipate that the output of epidermal cells is likewise dependent on the stimulus intensity.  Hence, rather than a fixed threshold beyond which epidermal cells are selectively activated, we hypothesize that increasing stimulus intensities drive increasing signal outputs to neurons.”

      (8) Some Protocols are missing. For example, in Figure 4, many stimulus combinations were used to test behavior. How were stimuli of different modalities applied to the animals? Further details need to be provided in the protocols.

      We thank the Reviewer for identifying this oversight. The methods section of our original submission detailed most of the stimulus combinations but omitted the opto + mechano combination (4F). We updated our methods to correct these omissions.

      (9) It might be helpful if the authors could provide a sample video for each behavior to clarify how they were each defined.

      Our manuscript includes a table with a detailed description of the behaviors (Table S2), and we added two annotated videos that show representative behavioral responses to optogenetic nociceptor or epidermis stimulation.

      (10) A supplementary summary table of genotypes might be helpful for the reader.

      Experimental genotypes are provided in the figure legends, and a detailed list of all alleles used in the study as well as their source is provided in supplemental table S1.

      Reviewer #2 (Public Review):

      Summary.

      The authors provide compelling evidence that stimulation of epidermal cells in Drosophila larvae results in the stimulation of sensory neurons that evoke a variety of behavioral responses. Further, the authors demonstrate that epidermal cells are inherently mechanoresponsive and implicate a role for store-operated calcium entry (mediated by Stim and Orai) in the communication to sensory neurons.

      Strengths.

      The study represents a significant advance in our understanding of mechanosensation. Multiple strengths are noted. First, the genetic analyses presented in the paper are thorough with appropriate consideration to potential confounds. Second, behavioral studies are complemented by sophisticated optogenetics and imaging studies. Third, identification of roles for store-operated calcium entry is intriguing. Lastly, conservation of these pathways in vertebrates raise the possibility that the described axis is also functional in vertebrates.

      Weaknesses.

      The study has a few conceptual weaknesses that are arguably minor. The involvement of store-operated calcium entry implicates ER calcium store release. Whether mechanical stimulation evokes ER calcium release in epidermal cells and how this might come about (e.g., which ER calcium channels, roles for calcium-induced calcium release etc.) remains unaddressed. On a related note, the kinetics of store-operated calcium entry is very distinct from that required for SV release. The link between SOC and epidermal cells-neuron transmission is not reconciled. Finally, it is not clear how optogenetic stimulation of epidermal cells results in the activation of SOC.

      (1) The involvement of store-operated calcium entry implicates ER calcium store release. Whether mechanical stimulation evokes ER calcium release in epidermal cells and how this might come about (e.g., which ER calcium channels, roles for calcium-induced calcium release etc.) remains unaddressed.

      Our studies suggest that mechanically evoked responses in epidermal cells involve both ER calcium release and store-operated calcium entry. Notably, we show that depletion of ER calcium stores before mechanical stimulation, by treating with thapsigargin, reduces (but does not eliminate) mechanically evoked calcium responses in fly epidermal cells (Fig. 6C-6F). Likewise, fly epidermal cells and human keratinocytes both exhibit mechanically evoked calcium responses in the absence of extracellular calcium (10mM EGTA to chelate all free calcium ions). These data support a model whereby mechanical stimuli trigger calcium release from ER stores and influx. Indeed, several cell types have been shown to display mechanically evoked release of calcium from stores. For example, mechanical stimulation of enteroendocrine cells of the gut epithelium results in both calcium release from ER stores and calcium influx across the plasma membrane (Knutson et al., 2023). Similar to our findings, Knutson et al found that depleting stores decreased mechanically evoked calcium signals by over 70% in these gut epithelial stores. In our revised manuscript we have more clearly emphasized these points.

      We agree with the reviewer that deciphering the mechanisms by which mechanical stimuli promote ER calcium release and subsequent store-operated calcium entry is an exciting topic to explore. One potential mechanism is the activation of a mechanosensitive receptor that promotes calcium release from the ER via calcium-induced calcium release or IP3 production, as has been proposed for enteroendocrine cells. A recent paper demonstrated that the ER itself is mechanosensitive and that mechanical stimuli promotes calcium release via the opening of calcium-permeable ion channels in the ER membrane (Song et al., 2024). Determining the relative contributions of store-operated calcium entry and ER calcium release and deciphering their underlying mechanisms will require a thorough investigation of ER calcium channels and receptors, thus we believe this would be beyond the scope of the present manuscript and merits publication on its own. However, we now include this in our discussion as an exciting new direction we aim to pursue.

      (2) The kinetics of store-operated calcium entry is very distinct from that required for SV release. The link between SOC and epidermal cells-neuron transmission is not reconciled.

      The Reviewer raises an interesting point regarding the mode of epidermal cell-neuronal communication. We demonstrated a requirement for dynamin-dependent vesicle release from epidermal cells in mechanical sensitization. However, the nature of the vesicular pool, the mode and kinetics of release, and the type of neuromodulator released remain to be characterized. Hence, it’s not clear that kinetics of synaptic vesicle release is an appropriate comparison. Our studies do demonstrate that behavioral responses to optogenetic epidermal stimulation are relatively slow – on the order of seconds – which is not incompatible with the kinetics of store-operated calcium entry. Furthermore, the primary functional output we define for epidermal mechanosensory responses, mechanical nociceptive sensitization, is apparent 10 sec following the stimulus and persists for minutes in our behavior assays. Consistent with this model, studies of the mammalian touch dome have shown that touch-sensitive Merkel cells secrete neurotransmitters to modulate neurons and promote sustained action potential firing on a similar timescale. Likewise, mechanically evoked ER calcium-release promotes sustained secretion of serotonin from enterochromaffin cells.

      (3) It is not clear how optogenetic stimulation of epidermal cells results in the activation of SOC.

      We appreciate the opportunity to clarify our results. We demonstrate that optogenetic epidermal stimulation elicits behavioral responses in larvae and calcium responses in somatosensory neurons, but we do not claim that optogenetic epidermal stimulation elicits SOC. Our optogenetic studies demonstrate the capacity for epidermal stimulation to modulate somatosensory function, but we characterize contributions of SOC only to mechanical stimuli which are more physiologically relevant. However, it is worth noting that CsChrimson is a calcium-permeable channel, suggesting that an increase in intracellular calcium may trigger epidermal-evoked neuronal responses and behaviors during optogenetic stimulation.

      References

      Hwang, RY, Zhong, L, Xu, Y, Johnson, T, Zhang, F, Deisseroth, K, and Tracey, WD (2007). Nociceptive neurons protect Drosophila larvae from parasitoid wasps. Curr Biol 17, 2105–2116.

      Knutson, KR, Whiteman, ST, Alcaino, C, Mercado-Perez, A, Finholm, I, Serlin, HK, Bellampalli, SS, Linden, DR, Farrugia, G, and Beyder, A (2023). Intestinal enteroendocrine cells rely on ryanodine and IP3 calcium store receptors for mechanotransduction. J Physiol 601, 287–305.

      Maksimovic, S, Nakatani, M, Baba, Y, Nelson, AM, Marshall, KL, Wellnitz, SA, Firozi, P, Woo, S-H, Ranade, S, Patapoutian, A, et al. (2014). Epidermal Merkel cells are mechanosensory cells that tune mammalian touch receptors. Nature 509, 617–621.

      Mikesell, AR, Isaeva, O, Moehring, F, Sadler, KE, Menzel, AD, and Stucky, CL (2022). Keratinocyte PIEZO1 modulates cutaneous mechanosensation. Elife 11, e65987.

      Song, Y, Zhao, Z, Xu, L, Huang, P, Gao, J, Li, J, Wang, X, Zhou, Y, Wang, J, Zhao, W, et al. (2024). Using an ER-specific optogenetic mechanostimulator to understand the mechanosensitivity of the endoplasmic reticulum. Dev Cell 59, 1396-1409.e5.

      Woo, S-H, Ranade, S, Weyer, AD, Dubin, AE, Baba, Y, Qiu, Z, Petrus, M, Miyamoto, T, Reddy, K, Lumpkin, EA, et al. (2014). Piezo2 is required for Merkel-cell mechanotransduction. Nature 509, 622–626.

    1. Author response:

      We appreciate the reviewers’ constructive comments and suggestions. We plan the following revisions to address the public reviews.

      Regarding model selection (from Reviewers 1 and 3)

      We will test whether the latent cause model has a better explanatory power for the observed reinstatement data compared with at least two other models, including the Rescorla-Wagner model. For each model, the prediction errors across all trials and those in the test 3 trial (reinstatement) will be calculated for individual animals. The explanatory power of the models will be discussed based on these results. 

      Regarding model validation (from Reviewers 1, 2, and 3)

      We acknowledge the reviewers’ concerns about potential parameter overfitting and misinterpretation. First, the simulation in the latent cause model will be run under other possible conditions to test whether our original condition can be justified, then clarify how certain parameters affect the predicted CR. Second, we will confirm if the prediction errors are comparable between experimental groups, present the correlation between parameters, and discuss this result in the revision. 

      To evaluate the effect of context in explaining reinstatement in the latent cause model, simulations of CR in test 3 when only context or tone is presented will also be performed and discussed with the behavioral data.

      Regarding the interpretation of the behavioral data (from Reviewers 1, 2, and 3) We will clarify our interpretation of the behavioral data by incorporating the additional analyses mentioned above; for example, to clarify the contribution of context in test 3, we will provide data on the CR before the tone presentation in our revision. In addition, how we expected and interpreted the reversal Barnes maze results from the memory modification characteristics estimated in the reinstatement test will be further discussed.

      Regarding the application of the latent cause model to the reversal Barnes maze task (from Reviewers 1, 2)

      We acknowledge the reviewers’ suggestions to apply the latent cause model to our Barnes maze results to strengthen the link and consistency. To further clarify the reason for including Barnes maze results, we will explicitly discuss how associative learning is involved in spatial learning in the revision. However, we will not be able to directly apply the latent cause model for the Barnes maze data for the following reasons. As we noted in the Results and Discussion, the latent cause model was built on associative learning and cannot be directly applied to the Barnes maze data. The cognitive processes in the Barnes maze task involve maintaining spatial representation of the environment, integrating own position and expected goal, and evaluating potential actions. Importantly, the chosen actions in this task directly affect subsequent observations, while an animal’s response based on an expected outcome typically does not alter future observation in a simple associative learning paradigm. 

      Thus, although associative learning (e.g., associations between the spatial cue and the location of the escape box) is certainly a critical building block and contributes to performance in the Barnes maze task, this mechanism alone cannot fully explain the animal’s navigation in the maze. We agree that having solid modeling results in the reversal Barnes maze task is an important direction, but extending the latent cause model for this purpose is beyond the scope of this study. We have suggested some possible approaches in the Discussion and will elaborate further on these conceptual distinctions and how latent cause framework assists in the interpretation of results.

    1. Author response:

      We thank the reviewers for their insightful feedback. Incorporating their recommendations will greatly enhance our manuscript for resubmission. Based on the review, it seems a major challenge to the interpretation of our study surrounds whether locomotion, itself, is responsible for increased ACC activity during our task. This was a shared concern for us during our analysis. We included data in our initial submission hoping to address these concerns. Specifically, we show that post-action activity outlasts movement termination, in most cases, on the order seconds after termination (Supplementary Fig 2). Likewise, post-action activity is not tied to shuttle initiations as ACC activity onset can vary greatly before and after initiation (Supplementary Fig 2). Lastly, the unique nature of action content neurons further supports a distinction from locomotor activity. They selectively fire for specific directions and, as a result, do not fire during movement in opposite directions. Despite these findings, we agree with reviews that inclusion of additional analyses, such as examining firing rates in respect to locomotion speed and acceleration/deceleration, will greatly strengthen our claim of ACC’s role in post-action activity. In our resubmission, we will seek to perform such an analysis, among others, to elucidate completely the role of locomotion in ACC post-action activity.

      Reviewers also pointed out an overall lack of details surrounding our task, analysis, statistical methods and experimental approaches. We will consider all the recommendations from the reviewers and integrate them into our resubmission to provide more detailed information. Notably, we will adjust our approach in describing our task. Reviewers discussed some criticism regarding the perceived novelty of the task as it shares many similarities with previous discrimination-avoidance tasks. The distinction with our task is regarding the nuance of how the meaning (safety vs shock) of the context and sensory stimuli dynamically changes based on the current environment (context x sound). This requires not only the discrimination of contextual and sensory stimuli but also the inter-modal integration of stimuli, which varies throughout the task. Sound A/B leads to different outcomes depending on the context, and similarly, the meaning of the context shifts in a sound-dependent manner.

      Lastly, in our follow-up submission we will work to include more robust analyses to utilize our temporal sensitivity of our recordings. We also will provide greater clarity on how each individual animal contributes to our overall findings. To conclude, we would like to once again thank our reviewers for their feedback and evaluation of our manuscript. We look forward to making the necessary adjustments for our future submission.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors tackled the public concern about E-cigarettes among young adults by examining the lung immune environment in mice using single-cell RNA sequencing, discovering a subset of Ly6G- neutrophils with reduced IL-1 activity and increased CD8 T cells following exposure to tobacco-flavored e-cigarettes. Preliminary serum cotinine (nicotine metabolite) measurements validated the effective exposure to fruit, menthol, and tobacco-flavored e-cigarettes with air and PG/VG serving as control groups. They also highlighted the significance of metal leaching, which fluctuated over different exposure durations to flavored e-cigarettes, underscoring the inherent risks posed by these products. The scRNAseq analysis of e-cig exposure to flavors and tobacco demonstrated the most notable differences in the myeloid and lymphoid immune cell populations. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Further sub-clustering revealed a flavor-specific rise in Ly6G- neutrophils and heightened activation of cytotoxic T cells in response to tobacco-flavored e-cigarettes. These effects varied by sex, indicating that immune changes linked to e-cig use are dependent on gender. By analyzing the expression of various genes and employing gene ontology and gene enrichment analysis, they identified key pathways involved in this immune dysregulation resulting from flavor exposure. Overall, this study affirmed that e-cigarette exposure can suppress the neutrophil-mediated immune response, subsequently enhancing T cell toxicity in the lung tissue of mice.

      Strengths:

      This study used single-cell RNA sequencing to comprehensively analyze the impact of e-cigarettes on the lung. The study pinpointed alterations in immune cell populations and identified differentially expressed genes and pathways that are disrupted following e-cigarette exposure. The manuscript is well written, the hypothesis is clear, the experiments are logically designed with proper control groups, and the data is thoroughly analyzed and presented in an easily interpretable manner. Overall, this study suggested novel mechanisms by which e-cigs impact lung immunity and created a dataset that could benefit the lung immunity field.

      We thank the reviewer for identifying the strengths of our work.

      Weaknesses:

      The authors included a valuable control group - the PG/VG group, since PG/VG is the foundation of the e-liquid formulation. However, most of the comparative analyses use the air group as the control. Further analysis comparing the air group to the PG/VG group, and the PG/VG group to the individual flavored e-cig groups will provide more clear insights into the true source of irritation. This is done for a few analyses but not consistently throughout the paper. Flavor-specific effects should be discussed in greater detail. For example, Figure 1E shows that the Fruit flavor group exhibits more severe histological pathology, but similar effects were not corroborated by the single-cell data.

      We thank the reviewer for this query. We agree that PG/VG group is the foundation of the e-liquid formulation and hence comparisons with this group is of significance to understand the effect of individual flavors on the cell population. Though we compared the flavored e-cig groups with PG/VG group, we did not discuss it in detail within the manuscript to avoid confusions in interpretation for such a big dataset. However, we will include the comparisons with the PG/VG group as a Supplement File in our revised manuscript to facilitate proper interpretation of our omics data to interested readers.

      While we agree that flavor-specific effects might be of interest, we did not delve into exploring them in detail as the fruit flavored e-liquids have now been regulated for sale in the US. Thus, from regulatory point of view, the effects of tobacco- and menthol-flavored e-liquids hold most interest. Since at the time of conducting this study, fruit flavors were in the market, we have still included the data. However, studying it further was not the focus of this work. Nevertheless, interested readers of our manuscript can have access to our dataset to allow further analyses and interpretation of our results.

      The characterization of Ly6g+ vs Ly6g- neutrophils is interesting and potentially very impactful. Key results like this from scRNAseq analyses should be validated by qPCR and flow cytometry.

      Also, a recent study by Ruscitti et al reported Ly6g+ macrophages in the lung which can potentially confound the cell type analysis. A more detailed marker gene and sub-population analysis of the myeloid clusters could rule out this potential confounding factor.

      We agree with the reviewer that the loss of Ly6G on neutrophils is a very interesting find and we are in process of designing neutrophil specific experiments to study the impact of e-cig exposure on neutrophil maturation and function which will be discussed in subsequent work by our group. However, to address the concerns raised by the reviewer, we are staining the lung tissue samples from air-and differently flavored e-cig aerosol exposed mouse lungs with Ly6G and S100A8 (universal marker for neutrophil) to see the infiltration of Ly6g+ vs Ly6g- neutrophils within the lungs of exposed and unexposed mice. This would also address the question if these populations were neutrophils or belong to another myeloid origin as suggested by recent publications. We will share the results from our findings in the revised manuscript and update our interpretations accordingly with better validations.

      Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavors of e-cigarettes can affect lung immunology, however there are numerous flaws including a low number of replicates and a lack of effective validation methods which reduces the robustness and rigor of the findings.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives good preliminary data that can be used to create new hypotheses in this area.

      We appreciate the reviewer for recognizing the strength of this work.

      Weaknesses:

      The major weakness is the low number of replicates and the limited analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and did not always support the findings (e.g. Figure 4D does not match 4C). Often n seems to be combined and only one data point is shown, it is not at all clear how the groups were analyzed and how many cells in each group were compared.

      We thank the reviewer for the critique to allow us to improve our analyses. We understand that the low number of replicates in this work makes the analyses difficult to draw solid conclusions, but this was a pilot study to understand the changes in the mouse lung upon acute exposures to flavored e-cig aerosols at a single cell level. So far, the e-cig field has been primarily focused on conducting toxicological studies to help regulatory bodies to set standards and enforce laws to better regulate the manufacture, sale and distribution of e-cig products. However, adolescents and young adults are still getting access to these products, and there is little to no understanding of how this may affect the lung health upon acute and chronic exposures. Single cell technology is a powerful tool to analyze the gene expression changes within cell populations to study cell heterogeneity and function. Yet, it is a costly tool, owing to which, conducting such analyses on large sample sizes is not ideal. This pilot study was designed to get some initial leads for future studies involving larger sample sizes and chronic exposures. Further, we still intend to share our results with the scientific community due to the value of such a dataset for a wider audience interested in learning about the mechanistic underpinnings of e-cig exposures in vivo.

      We understand that the validations are limited in our current work and so we are in process of conducting some immunostaining to validate a few targets made through this work. We also want to add here that validating single cell findings using any of the classical methods of experimentation including ELISA, qPCR or flow cytometry is sometimes difficult as many of these techniques still investigate the tissue while the changes shown in single cell analyses are mainly pertaining to a single cell type. This could be a probable reason for the scRNA seq results not aligning with our findings from flow cytometry. The data/findings from this pilot study have now allowed us to be better informed to design an effective flow panel for our future studies. In terms of the statistics and the number of cells for each analysis, we will share the detailed account and information for each to allow better interpretation of our results.

      Only 71,725 cells means only 7,172 per group, which is 3,586 per animal - how many of these were neutrophils, T-cells, and macrophages? This was not shown and could be too low.

      We do agree that the number of cells could be too low, but to avoid this we never studied the gene expression variations at the finest level of cell identity. We classified the cell clusters into general annotations -myeloid, lymphoid, endothelial, stromal and epithelial- and identified the changes in the gene expressions. Of these, only two clusters (myeloid and lymphoid) with more than ~1000 cells per cell type per group were studied in detail. We will include the cell count information to allow better interpretation of our results in the revised manuscript.

      The dynamic range of RNA measurement using scRNAseq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comment, but in general, the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells.

      This is a well-made point, and we thank the reviewer for this comment. We agree that the dynamic range RNA measurement is limited and for low cell numbers that could lead to bias. We are in process of validating the findings regarding the presence of Ly6G+ and Ly6G- cells in our control and treated lungs, the outcome of which will be discussed in the revised manuscript. We will also provide the cell number for the Ly6G- cell cluster for each sample with more detailed discussion of our findings. Due to the small sample size and cell capture, few limitations are hard to overcome which will be further elaborated upon in our revisions.

      There is no rigorous quantification of Ly6G+ and Ly6G- cells in the flow cytometry data.

      We understand that flow-based quantification of our scRNA seq findings would be interesting. However, flow cytometry and single cell suspension to perform sequencing were performed parallelly for this study. We used a basic flow panel using single markers to identify individual immune cell type. We did identify changes in the Ly6G population in our treated and control samples using scRNA seq and intend to include it as a marker for our future studies using flow cytometry. But unfortunately, the same analyses could not be performed for the current batch of samples. We will still include results from IHC staining to identify the Ly6G+ and Ly6G- population in the lung tissues from control and treated mice in revised manuscript to address some of the concerns raised here.

      Eosinophils are heavily involved in lung biology but are missing from the analysis.

      We used RBC lysis buffer to remove the excess RBCs during lung digestion for preparation of single cell suspension for scRNA seq in this study. Reports suggest that RBC lysis could adversely affect the eosinophil number and function. We did not identify any cell cluster, representing markers for eosinophils through our scRNA seq data and we believe that our lung digestion protocol could be the reason for the same. We have studied the eosinophil number changes through flow cytometry in these samples and have found significant changes as well. However due to our inability to find cell clusters for eosinophil through scRNA seq data, we did not include these results in the final manuscript. To avoid confusions and maintain transparency we will include our results from flow cytometry experiments in the revised manuscript.

      The figures had no titles so were difficult to navigate.

      We will make necessary adjustments to the data representation and include the titles to enable easy navigation of the Figures.

      PG/VG is not defined and not introduced early enough.

      We agree that PG/VG is an important control to compare in e-cig studies. This was the reason why this group was included, and we performed comparisons with this group for scRNA seq studies as well. However, to reduce the complexity of the study, we only shared the comparisons with Air control in this manuscript. We will include the comparisons made with PG/VG group as a Supplementary File in the revised manuscript to allow the interested readers have access to the study results and make necessary interpretations for future research.

      Neutrophils are not well known to proliferate, so any claims about proliferation need to be accompanied by validation such as BrdU or other proliferation assays.

      We thank the reviewer for this suggestion; however, we cannot perform the BrDU or other proliferation assay on neutrophils for now. We are planning to include these in the study designs of our future work, however we have limitations of funds to continue further experimentation to support this claim for this study. We mention clearly that this is only a scRNA seq finding and requires further study to avoid over-interpretation of our results.

      It was not clear how statistics were chosen and why Table S2 had a good comparison (two-way ANOVA with gender as a variable) but this was not used for other data particularly when looking at more functional RNA markers (Table S2 also lacks the interaction statistic which is most useful here).

      We thank the reviewer for bringing this concern. We understand that this is a valid point and will include all the necessary information regarding the statistics and other related parameters in the revised manuscript.

      Many statistics are only vs air control, but it would be more useful as a flavor comparison to see these vs PG/VG. In some cases, the carrier PG/VG looks worse than some of the flavors (which have nicotine).

      We will include the comparisons with PG/VG as supplementary file in our revised manuscript, however we do not intend to describe all those changes in detail in the main manuscript.

      The n number is a large issue, but in Figures such as 4, 6, and 7 it could be a bigger factor. The number of significant genes identified has been determined by chance rather than any real difference, e.g. Is Il1b not identified in Fruit flavor vs air because there wasn't enough n, while in Air vs Tobacco, it randomly hit the significance mark. This is but an example of the problems with the analysis and conclusions.

      While we agree in part with the concern raised here, we wish to point out that there are limitations to every experiment. In our opinion, an omics study is not necessarily aimed to find the changes at transcript level with absolute certainty, rather to identify probable cell and gene targets to validate with subsequent work. We never claim that our findings are absolute outcomes but rather add the limitation of sample number and need for further research at every step. The strength of this work is to be the first study of its kind looking at changes in the lung cell population at single cell level upon e-cig aerosol exposure. This study has provided us with interesting gene and cell targets that we are now validating with future work. We still strongly believe that a dataset like this is a useful resource for a wider audience to allow efficient study designs and hence it is befitting to be published and discussed amongst our peers.

      The data in Figure 7A is confusing, if this is a comparison to air, then why does air vs air not equal 1? Even if this was the comparison to the average of air between males and females, then this doesn't explain why CCL12 is >1 in both. Is this z-score instead? Regardless the data is difficult to interpret in this format.

      We thank the reviewer for pointing this out. We realize that the data might be difficult to understand due to scaling of the color codes for the heatmap. We will change the graphical representation and include actual number for fold change in our revised manuscript to allow easy interpretation of these results.

      Individual n was not shown for almost all experiments - e.g. Figure 1D - what is this representative of? Figure 2D - is this bulk-grouped data for all cells and all mice? The heatmaps are also pooled from 2n and don't show the variability.

      While we have included a pictorial representation of the n number in Figure 1A and mentioned n number in the Figure legends for each figure, we understand that it maybe difficult to navigate. We will attempt to address this in a better manner in the revised manuscript.

      However, with respect to the second comment we would like to differ from the reviewer’s opinion. Each scRNA seq data had 2 samples – one for male and another for female which has been clearly shown in the current figures. The pooling of cells as mentioned in the comment happened at the stage of preparation of cell suspension from each sex/group at the start of the sequencing. We do not have any means to show the variability amongst pooled samples, which we acknowledge as a shortcoming of our work. So, in terms of representation of the heatmaps and data analyses we have included all the needed information to uphold transparency of our study design and data visualization for each figure and would like to stick to the current representations.

      Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up-and-down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      (1) Single-cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      (2) Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      (3) The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      (4) Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that the data collected was relevant.

      We thank the reviewer for identifying the key strengths of our work and listing it in a concise and well-rounded fashion.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models.

      This study was not designed to study the effects of chronic exposures on lung tissues. We were interested in delineating the effect of acute exposures for which the proposed study design was chosen. Previous work by our group has performed similar exposures and has been well received by the community. We understand that chronic exposures will be interesting to look at, however that was not the purpose of this pilot study. We will now explicitly mention this aspect in the revised manuscript.

      Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      We thank the reviewer for this observation, and we will include the necessary validations and details of the sex-based statistical analyses in the revised version of this manuscript.

      Statistical analyses lack rigor and are not always displayed with the most appropriate graphical representation.

      We thank the reviewer and will include all the necessary statistical details with more details in the revised manuscript.

      Overall, the paper and its discussion are relatively limited and do not delve into the significance of the findings or how they fit into the bigger picture of the field.

      We are in process of performing a few validatory experiments and intend to include few other pieces of data to this manuscript to add to the overall merit of our findings. However as pointed out by the reviewer themselves the strength of this work is in the first ever scRNA seq analyses of mouse exposed to differently flavored e-cig aerosols in vivo. We also show cell-specific differential gene expression and address some of the major queries made around e-cig research including release of metals on a day-to-day basis from the same coil. The limited sample number make it difficult to draw solid conclusions from this work, which has been discussed as a shortcoming. However the major strength of this work is not in identifying specific trends but rather to explore the possible cell and gene targets to expand the study for longer (chronic) exposures with a larger sample group.

      The manuscript lacks validation of findings in tissue by other methods such as staining.

      We are conducting some studies and will include the validatory experiments and staining in the revised manuscript to support our findings.

      This paper provides a foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      We thank the reviewer for this observation. The cell numbers for some cell clusters (especially epithelial cells) were too low. So, though we have performed the differential gene expression analyses on all the cell clusters, we refrained from discussing it in the manuscript to avoid over interpretation of our results. Only clusters with high enough (~1000) cells per sex per group were used to plot the heatmaps. We will also include the cell numbers for each cell type in the revisions to allow better interpretation of our data. Furthermore, the raw data from this study will be freely available to the public upon publication of this manuscript. This would enable the interested readers to access the raw data and study the cell types of interest in detail based on their study requirements. This data will be a useful resource for all in this community to inform and design future studies.

    1. Author response:

      Reviewer #1:

      A) The presentation of the paper must be strengthened. Inconsistencies, mislabelling, duplicated text, typos, and inappropriate colour code should be changed.

      We will revise the manuscript to correct the abovementioned issues.

      B) Some claims are not supported by the data. For example, the sentence that says that "adolescent mice showed lower discrimination performance than adults (l.22) should be rewritten, as the data does not show that for the easy task (Figure 1F and Figure 1H).

      We will carefully review, verify claims, and correct conclusions where needed.

      C) In Figure 7 for example, are the quantified properties not distinct across primary and secondary areas?

      We will analyse the data in Figure 7 separately for AUDp and secondary auditory cortices to test regional differences. Additionally, we will provide a table summarizing key neuronal firing properties for each area during passive recordings to clarify how activity varies across cortical subregions and developmental stages.

      D) Some analysis interpretations should be more cautious. (..) A lower lick rate in general could reflect a weaker ability to withhold licking- as indicated on l.164, but also so many other things, like a lower frustration threshold, lower satiation, more energy, etc).

      We will address issues around lick bias including alternative explanations, such as differences in motivation or impulsivity.

      Reviewer #2:

      A) For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We will edit the discussion and clarify these points. In addition, we will adjust and extend the methodology section to clarify the rationale of our analysis.

      B) The results of the optogenetic manipulation, while very interesting, warrant a more in-depth discussion.

      We agree that the effects observed in our optogenetic manipulation warrant further discussion. We will extend on the analysis and discussion of ACx silencing.

      Reviewer #3:

      A) One fact that could help shed light on this would be to know how often the animals licked the spout in between trials. Finally, for the head-fixed version of the task, only d' values are reported. Without the corresponding hit and false alarm rates (and frequency of licking in the intertrial interval), it's hard to know what exactly the animals were doing.

      We recognize the need for a more nuanced analysis for the head-fixed version of the task. We will extend the behavioral analysis and provide more details to clarify these points.

      B) There are some instances where the citations provided do not support the preceding claim. For example, in lines 64-66, the authors highlight the fact that the critical period for pure tone processing in the auditory cortex closes relatively early (by ~P15). However, one of the references cited (ref 14) used FM sweeps, not pure tones, and even provided evidence that the critical period for this more complex stimulus occurred later in development (P31-38). Similarly, on lines 72-74, the authors state that "ACx neurons in adolescents exhibit high neuronal variability and lower tone sensitivity as compared to adults." The reference cited here (ref 4) used AM noise with a broadband carrier, not tones.

      We appreciate the reviewer pointing out instances where our citations may not fully support our claims. We will carefully review the relevant citations and revise them to ensure they accurately reflect the findings of the cited studies. We will update references in lines 64–66 and 72–74 to better align with the specific stimulus types and developmental timelines discussed.

      C) Given that the authors report that neuronal firing properties differ across auditory cortical subregions (as many others have previously reported), why did the authors choose to pool neurons indiscriminately across so many different brain regions?

      We agree that pooling neurons from multiple auditory cortical regions could potentially obscure region-specific differences. However, we addressed this concern by analyzing regional differences in neuronal firing properties, as shown in Supplementary Figures S4-1 and S4-2, and Supplementary Tables 2 and 3. Additionally, we examined stimulus-related and choice-related activity across regions and found no significant differences, as presented in Supplementary Figure S4-3. Please see our response to Reviewer 1, where we further elaborate on this point.

      D) And why did they focus on layers 5/6? (Is there some reason to think that age-related differences would be more pronounced in the output layers of the auditory cortex than in other layers?)

      We acknowledge that other cortical layers are also of interest and may contribute differently to auditory processing across development. Our focus on layers 5/6 was motivated by both methodological considerations and biological relevance. These layers contain many of the principal output neurons of the auditory cortex, and are therefore well positioned to influence downstream decision-making circuits. We will clarify this rationale in the revised manuscript and note the limitations of our approach.

    1. Author response:

      Reviewer #1 (Public Review):

      The work of Umetani et al. monitors the death of about 100,000 cells caused by lethal antibiotic treatments in a microfluidic device. They observe that the surviving bacteria are either in a dormant or in a non-dormant state prior to the antibiotic treatment. They then study the relative abundances of these different persister cells when varying the physiological state of the culture. In agreement with previous observations, they observe that late stationary phase cultures harbor a high number of dormant persister cells and that this number goes down as the culture is more exponential but remains non-zero, suggesting that cultures at the exponential phase contain different types of persister bacteria. These results were qualitatively similar in a rich and poor medium. Further characterization of the growing persister bacteria shows that they often form Lforms, have low RpoS-mcherry expression levels and grow only slightly more slowly than the non-persister bacteria. Taken together, these results draw a detailed view of persister bacteria and the way they may survive extensive antibiotic treatments. However, in order to represent a substantial advance on previous knowledge, a deeper analysis of the persister bacteria should be done.

      We thank the reviewer for suggesting the addition of more detailed analyses of persister cells. As we wrote in our response to Essential Revision 1, we now include a new section titled “Response of growing persisters to Amp exposure is heterogeneous” (Page 11-12) and present the results of the detailed analyses of single-cell dynamics of growth and cell morphology over the course of the pre-exposure, exposure, and post-exposure periods (Fig. 2D and H, Fig. 4B and D, Fig. 4 – figure supplement 1 and 2, Fig. 5B and D, Fig. 5 – figure supplement 1, Fig. 8B and D, and Figure 8 – figure supplement 1). The new results characterize differential responses to Amp treatment among growing persister cells (Fig. 4A-D, Fig. 4 – figure supplement 1, Fig. 4 – figure supplement 2A, Fig. 5A-D, and Fig. 5 – figure supplement 1), comparable division rates of MG1655 between non-surviving cells and persister cells growing prior to antibiotic treatments (Fig. 4E and Fig. 8E), except for the post-exponential phase cell populations of MF1 to Amp treatment in the LB medium and the post-exponential phase cell populations of MG1655 to Amp treatment in the M9 medium (Fig. 4 – figure supplement 2B and Fig. 5E) and the presence of persister cells to CPFX that avoid filamentation after the treatment (Fig. 8C and D, and Fig. 8 – figure supplement 1). We believe that these new analyses would provide new insights into the diverse dynamics and survival modes of antibiotic persistence at the single-cell level and represent important contributions to the field.

      Reviewer #2 (Public Review):

      The main question asked by Umenati et al. is whether persister cells to ampicillin arise preferentially from dormant, non-dividing cells or from cells that are actively growing before antibiotic exposure. The authors tracked persister cells generated from populations at different growth phases and culture media using a microfluidic device coupled to fluorescence microscopy, which is a challenge due to the low frequency of these persister cells. One of the main conclusions is that the majority of persisters arising in exponentially-growing populations originated from actively-dividing cells before the antibiotic treatment, reinforcing the idea that dormancy is not a prerequisite for persister formation. The authors made use of a fluorescent reporter monitoring RpoS activity (RpoS-mCherry fusion) and observed that RpoS levels in these persister cells were low. In the few lineages that exhibited no growth before the ampicillin treatment, RpoS levels were low as well, indicating that RpoS is not a predictive marker for persistence. By performing the same experiment with early and late stationary phase cultures, the authors observed that the proportion of persister cells that originated from dormant cells before the ampicillin treatment is significantly increased under these conditions. In the late stationary phase condition, dormant cells were expressing high levels of RpoS. The authors suggested that RpoS-mCherry proteins form aggregates which were suggested by the authors to be a characteristic of 'deep dormancy'. These cells were mostly unable to restart growth after the antibiotic removal while others with the lowest levels of RpoS tended to be persister. Confirming that these cells indeed contain protein aggregates as well as determining the physiological state of these cells appears to be crucial.

      We thank reviewer #2 for pointing out the critical issue with the RpoS-mCherry fusion that we used to quantify RpoS expression levels in single cells in the original manuscript. As explained in our reply to the comments below, we performed a suggested experiment and confirmed that the RpoS function was impaired by tagging it with mCherry. To resolve this issue, we repeated almost all the experiments using the wild-type strain MG1655 and confirmed the reproducibility of the main results (Fig. 3, Fig. 3 – figure supplement 1, and Fig. 7). Due to this change of the main strain used in this study, we removed the results on the correlation between RpoS expression and the persistence trait in the revised manuscript because it may not reflect the relationship of intact RpoS. However, we decided to still keep and show some of the results with the MF1 strain, such as the population killing curves and the survival mode analyses, because they also provide insight into the role of RpoS in antibiotic persistence. In particular, we found both beneficial and detrimental effects of RpoS on antibiotic persistence, depending on culture conditions and duration of antibiotic treatment (Fig. 1 – figure supplement 3 and Fig. 6 – figure supplement 1). Therefore, we have included these results and related discussions in the revised manuscript.

      Reviewer #3 (Public Review):

      In their manuscript, Umetani, et al. address the question of the origin of persister bacteria using single-cell approaches. Persistence refers to a physiological state where bacteria are less sensitive to antibiotherapy, although they have not acquired a resistance mutation; importantly, the concept of persistence has been refined in the past decade to distinguish it from tolerance where bacteria are only transiently insensitive. Since persister cells are very rare in growing populations (typically 1e-5 or 1e-6), it is very challenging to observe them directly. It had been proposed that individual cells surviving antibiotics are not growing at the start of the treatment, but recent studies (nicely reviewed in the introduction) where persister bacteria were observed directly do not support this link. Following a similar line, the authors nonetheless still aim at "investigating whether non-growing cells are predominantly responsible for bacterial persistence". Based on new experimental data, they claim the contrary that most surviving cells were "actively growing before drug exposure" and that their work "reveals diverse survival pathways underlying antibiotic persistence".

      We thank the reviewer for this helpful comment, which suggested to us that some revisions in our Introduction would better place our study in the context of previous understanding of antibiotic persistence. As mentioned in our response to Essential Revision 4 and the second comment of Reviewer 1's Recommendations for the authors, we have modified the Introduction to more appropriately place our study in the context of the field.

      The main strengths of the manuscript are in my opinion:

      - To report on direct observation of E. coli persisters to ampicillin (200µg/mL) in 5 different growth media (typically 20 persisters or more per condition, one condition with 12 only), which constitutes without a doubt an experimental tour de force.

      - To aim at bridging the population level and the single-cell level by measuring relevant variables for each and analyzing them jointly.

      - To demonstrate that in most conditions a large fraction of surviving cells was actively growing before drug exposure.

      In addition, although it is well-known that E. coli doesn't need to maintain its rod shape for surviving and dividing, I found very remarkable in their data the extent to which morphology can be affected in persister cells and their progeny, since this really challenges our understanding of E. coli's "lifestyle" (these swimming amoeba-like cells in Supp Video 11 are mind-blowing!).

      We are grateful to the reviewer for the articulation of the strength of this study. 

      Unfortunately, these positive aspects are counter-balanced by several shortcomings in the way experiments are analyzed and interpreted, which I explain below. Moreover, the manuscript is written in a way that makes it very hard to find important information on how experiments are done and is likely to leave the reader with an impression of confusion about what the main findings actually are.

      We thank the reviewer for pointing out these important issues regarding the original manuscript. Please see our replies below regarding how we corresponded to each specific comment to resolve the issue. To make the experimental methods and procedures more accessible and interpretable, we have added more explanations of the experimental details to the Results and Methods sections. Furthermore, since we understood that some of the confusions came from the insufficient explanation of the preculture procedures for the microfluidic experiments, we have modified the schematic illustration of the method shown in Fig. S1 in the original manuscript and moved it as the first main figure in the revised manuscript (Fig. 1C and D). We have also added an illustration that explains the cultivation procedures for the batch culture experiments as Fig.

      6A. 

      My major concerns are the following:

      (1) The main interpretation framework proposed by the authors is to assess whether cells not growing before drug exposure (so-called "dormant") are more or less likely to survive the treatment than growing ones ("non-dormant"). Fig 2A and Fig 3G show the main conclusions of the article from this perspective, that growing cells can survive the treatment and that the fraction of persisters in a given condition is not explained by the fraction of "dormant" cells, respectively. With this analysis, the authors essentially assume that "dormant" cells are of the same type in their different conditions, which ignores the progress in this field over the last decade (Balaban et al. 2019). I argue on the contrary that the observation of "diverse modes of survival in antibiotic persistence" is expected from their experimental design. In particular, the sensitivity of E. coli to beta-lactams such as ampicillin is expected to be much lower during the lag out of the stationary phase, a phenomenon which has been coined "tolerance"; hence in the Late Stationary condition, two subpopulations coexist for which different response to ampicillin is expected. I propose steps toward a more compelling interpretation of the experimental data. Should this point be taken seriously by the authors, it, unfortunately, implies a major rewriting of the article, including its title.

      We thank the reviewer for bringing to our attention the point that may have caused confusion in the original manuscript. 

      The primary purpose of this manuscript was not to assess whether non-growing cells prior to drug exposure are more or less likely to survive treatment than growing cells. Rather, we wanted to examine how different persister cell dynamics emerge at the single-cell level depending on previous cultivation history, growth media, and antibiotic types. We believe that this point is clearer in the revised manuscript with the newly added single-cell dynamics data (Fig. 2D, 2H, 4B, 4D, Fig. 4 – figure supplement 1 and 2A, Fig. 5B, 5D, Fig. 5 – figure supplement 1, Fig. 8B, 8D, and Fig. 8 – figure supplement 1). 

      We also did not mean to imply that "dormant cells" were of the same type under different conditions, as we were aware of the diversity of cellular states of non-growing cells, as well as the reduced sensitivity of cells to antibiotics during the lag out of stationary phase. We believe that one of the reasons this point may have been unclear is that in the previous version we had referred to all cells that were not growing prior to antibiotic treatment as "dormant cells", a term that is often used in a more restricted way to refer to cells under prolonged growth arrest. Therefore, in the revised manuscript, we have avoided the term "dormant cells" and instead simply referred to these as "non-growing cells". Accordingly, we have changed the title of the paper from "Observation of non-dormant persister cells reveals diverse modes of survival in antibiotic persistence" to "Observation of persister cell histories reveals diverse modes of survival in antibiotic persistence".

      To further address these points, we have improved the description of the experimental procedures for the single-cell measurements (see the reviewer's next comment as well). The nongrowing persisters of the MF1 strain found in the post-exponential phase cell populations must be of a different type than those found in the post-early and post-late stationary phase cell populations due to the experimental design. All early and late stationary phase cells were maintained in a non-growing state by flowing conditioned media prepared from the early and late stationary phase cultures until the start of the time-lapse measurements. Thus, aside from potential physiological heterogeneity, the non-growing cells prior to drug treatment are all long lagging cells. On the other hand, for the post-exponential phase condition, we maintained exponential growth conditions during the period from the start of the second pre-culture to the start of antibiotic treatment, including the period during sample preparation for time-lapse measurements. Given the exponential dilution by growth of cell populations, the non-growing persisters are unlikely to be long lagging cells (see our response to Reviewer 2's third comment  in "Recommendations for the authors"). We now describe these experimental procedures in more detail in the Results section (L161-178, L287-297). In addition, we discuss the diversity of cellular states of both non-growing and growing cells in Discussion, citing literature (L545-557).

      (2) The way the authors describe their experiments with bacteria in the stationary phase is very problematic. For instance, they write that they "sampled cells from early and late stationary phases (...) and exposed them to 200 μg/mL of Amp in both batch and single-cell cultures." For any reader in a hurry (hence skipping methods and/or supplementary figure), this leads to believe that bacteria sampled in the stationary phase were exposed to the drug right away (either by adding the drug to the stationary phase sample, or more classically by transferring cells to fresh media with antibiotics). However, it turns out that, after sampling and loading in the microfluidic device, bacteria are grown 2 h in LB (or 4 h in M9) - I don't know what to think of such a blatant omission. The names chosen for each condition should reflect their most important aspects, here "stationary" is simply not appropriate - maybe something like "post early stationary" instead. In any case, I believe that this point highlights further the misconception pointed out in 1 and implies that the average reader will be at best confused, and probably misled.

      We again thank the reviewer for pointing out the insufficient explanation of the method for the single-cell measurements and the helpful recommendation regarding our nomenclature for different conditions. As mentioned above, we now present the previous supplementary figure that schematically explains the experimental procedure as the first main figure to clarify how we prepared the cells loaded into the microfluidic device for single-cell measurements (Fig. 1C and D). Also, following the reviewer's suggestion, we now refer to the conditions as "post-exponential phase," "post-early stationary phase," and "post-late stationary phase" in the revised manuscript. 

      We included a 2-hour (or 4-hour in M9) cultivation period in fresh medium in batch cultures for measuring killing curves to make the cultivation conditions prior to antibiotic treatment as similar as possible between batch and microfluidic experiments. We have clarified the presence of preexposure cultivation of post-early stationary and post-late stationary phase cell populations in the fresh medium before treating them with antibiotics (L264-269, Fig. 6A), so that readers can more easily recognize the experimental conditions.

      (3) Figures 4 and 5 are of very minor significance, and the methodology used in Fig 4 is questionable. The authors measure the abundance of an Rpos-mCherry translational fusion because its "high expression has been suggested to predict persistence". The rationale for this (that an RpoS-mCherry fusion would be a proxy for intracellular ppGpp levels, and in turn predict persistence) has never been firmly established, and the standards used in the article where this reporter was introduced (Maisonneuve, Castro-Camargo, and Gerdes 2013) are notoriously low (which eventually led to its retraction) - I don't know what to think of the fact that the authors cite a review by this group rather than their retracted article. While transcriptional fusions of promoters regulated by RpoS have been proposed to measure its regulatory activity (Patange et al. 2018), the combination of self-regulation and complex post-translational regulation of rpoS makes the physical meaning of the reporter used here completely unclear. Moreover, this translational fusion is introduced without doing any of the necessary controls to demonstrate that the activity of RpoS is not impaired by the addition of the fluorescent protein. Fig 5 simply reports the existence of persisters to ciprofloxacin growing before the treatment. This might be a new observation but it is not unexpected given that a similar observation has been made with a similar drug, ofloxacin (Goormaghtigh and van Melderen 2019), as pointed out in the introduction. There is no further quantitative claim on this.

      We thank the reviewer for pointing out the issue of the RpoS-mCherry fusion. As we mentioned in our response to Essential Revision 2 and also to the comment from reviewer #2, we have tested the sensitivity of this fluorescent reporter strain to oxidative stress and confirmed that it is as sensitive as the rpoS strain (Fig. 1 – figure supplement 1C). Therefore, the RpoS function seems to be defective in this strain, as now explained in Results (L69-79). After confirming the problem with the RpoS-mCherry fusion, we removed all analyses and related arguments that relied on the RpoS expression level (previous Figure 4). In addition, we repeated almost all the experiments with the original MG1655 strain to confirm that the observed results are not specific to the problematic reporter strain. 

      Regarding the experiments with CPFX, we have added a more detailed analysis of single cell dynamics and found that, contrary to the reported results for ofloxacin, not all persistent cells show filamentation after drug withdrawal (Fig. 8C and D, Fig. 8 – figure supplement 1). In addition, we performed new microfluidic experiments in which we treated post-late stationary phase cells with CPFX (Fig. 3). In contrast to the Amp treatment result and the previous study that reported the persistence of post-stationary phase cell populations to ofloxacin (ref. 20), all the persisters for which we identified the pre-exposure growth traits in this condition grew normally prior to CPFX treatment. These newly added analyses and experiments clarify the significance of the CPFX experiments. 

      (4) The authors don't mention the dead volume nor the speed of media exchange in their device. Hopefully, it is short compared to the duration of the treatment; however, it is challenging to remove all antibiotics after the treatment and only 1e-3 or 1e-4 of the treatment concentration is already susceptible to affecting regrowth in fresh media. If this is described in another article, it would be worth adding a comment in the main text.

      We thank the reviewer for bringing up this important point. We have added the perfusion chamber volume and medium flow rate information in the Methods section (L809-817).   

      In the study in which two of the authors participated, the medium exchange rate across the semipermeable membrane was evaluated in a similar device with similar microchamber dimensions (ref. 26). There, we confirmed that the medium exchange was completed within 5 min, which is much shorter than the period of antibiotic treatment and post-antibiotic treatment periods for observing regrowth. We have also included this information in the main text with the reference (L58-63).

      Despite the relatively high medium exchange rate, we cannot formally exclude the possibility that a small amount of antibiotic may remain in the device, e.g. due to non-specific adsorption on the internal surface of the microchambers. In such cases, the residual antibiotics may influence the physiological states of the cells and the regrowth kinetics in the post-exposure periods, as suggested by the reviewer. However, the frequencies of persister cells in the cell populations in our single-cell measurements are comparable to those in the batch culture measurements. Therefore, the removal of antibiotic drugs in our device is at least as efficient as in the batch culture assay. To clarify this point, we have added a paragraph to the Discussion with a reference that reviews the influence of antibiotics at concentrations significantly lower than the MICs (L482-

      489).    

      (5) Fig 2A supports the main finding that a significant fraction of bacteria surviving the treatment are growing before drug exposure, but it uses a poorly chosen representation.

      - In order to compare between conditions, one would like to see the fraction of each type in the population.

      - The current representation (of a fraction of each type among surviving cells) requires a side-byside comparison with a random sample (which will practically be equivalent to the fraction of each type among killed cells) in order to be informative.

      We have changed the style of the previous Fig. 2A to show the fraction of each type in the population instead of the fraction of each type among surviving cells (Fig. 3 and Fig. 3-figure supplement 1).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to identify the proteins that compose the electrical synapse, which are much less understood than those of the chemical synapse. Identifying these proteins is important to understand how synaptogenesis and conductance are regulated in these synapses. The authors identified more than 50 new proteins and used immunoprecipitation and immunostaining to validate their interaction of localization. One new protein, a scaffolding protein, shows particularly strong evidence of being an integral component of the electrical synapse. However, many key experimental details are missing (e.g. mass spectrometry), making it difficult to assess the strength of the evidence.

      Strengths:

      One newly identified protein, SIPA1L3, has been validated both by immunoprecipitation and immunohistochemistry. The localization at the electrical synapse is very striking.<br /> A large number of candidate interacting proteins were validated with immunostaining in vivo or in vitro.

      Weaknesses:

      There is no systematic comparison between the zebrafish and mouse proteome. The claim that there is "a high degree of evolutionary conservation" was not substantiated.

      We agree that we should have included a comprehensive comparison of proteins captured in the different species.  We are assembling this table and it will be included in the revised manuscript.  There is, indeed, significant conservation of many of the proteins enriched in both species.

      No description of how mass spectrometry was done and what type of validation was done.

      Since the mass spec was outsourced to a core facility, we had not included methodological details.  We have requested these and will include full details in the revised version of the manuscript.  In terms of “validation,” enrichment of proteins at electrical synapses was determined based on capture relative to control samples (non-transgenic zebrafish retinas or non-transgenic mouse retinas infected with the dGBP-TurboID virus) captured and processed at the same time.  Actual validations based on protein co-localization and pull-downs is the subject of the rest of the manuscript, and could only be done for a fraction of the identified proteins.  This type of validation can be pursued in many future studies. 

      The threshold for enrichment seems arbitrary.

      Yes, the thresholds are somewhat arbitrary.  This is due to the fact that experiments that captured larger total amounts of protein (mouse retina samples) had higher signal-to-noise ratio than those that captured smaller total amounts of protein (zebrafish retina).  This allowed us to use a more stringent threshold in the mouse dataset to focus on high probability captured proteins. 

      Inconsistent nomenclature and punctuation usage.

      We have scanned through the manuscript and updated terms that were used inconsistently in the interim revision of the manuscript.

      To describe the mass spec procedure, we will get in touch with the mass spec facility and provide the details in the next round of submission.

      The description of figures is very sparse and error-prone (e.g. Figure 6).

      In Figure 1B, there is very broad non-specific labeling by avidin in zebrafish (In contrast to the more specific avidin binding in mice, Figure 2B). How are the authors certain that the enrichment is specific at the electrical synapse?

      The enrichment of the proteins we identified is specific for electrical synapses because we compared the abundance of all candidates between Cx35b-V5-TurboID and wildtype retinas. Proteins that are components of electrical synapses, will only show up in the Cx35b-V5-TurboID condition. The western blot (Strep-HRP) in figure 1C shows the differences in the streptavidin labeling and hence the enrichment of proteins that are part of electrical synapses. Moreover, while the background appears to be quite abundant in sections, biotinylation is a rare posttranslational modification and mainly occurs in carboxylases: The two intense bands that show up above 50 and 75 kDa.  The background mainly originates from these two proteins.

      In Figure 1E, there is very little colocalization between Cx35 and Cx34.7. More quantification is needed to show that it is indeed "frequently associated."

      We agree that “frequently associated” is too strong as a statement. We corrected this and instead wrote “that Cx34.7 was only expressed in the outer plexiform layer (OPL) where it was associated with Cx35b at some gap junctions” in line 150. There are many gap junctions at which Cx35b is not colocalized with Cx34.7. 

      Expression of GFP in HCs would potentially be an issue, since GFP is fused to Cx36 (regardless of whether HC expresses Cx36 endogenously) and V5-TurboID-dGBP can bind to GFP and biotinylate any adjacent protein.  

      Thank you for this suggestion! There should be no Cx36-GFP expression in horizontal cells, which means that the nanobody cannot bind to anything in these cells. Moreover, to recognize specific signals from non-specific background, we included wild type retinas throughout the entire experiments. This condition controls for non-specific biotinylation.

      Figure 7: the description does not match up with the figure regarding ZO-1 and ZO-2.

      It appears that a portion of the figure legend was left out of the submitted version of the manuscript.  We have put the legend for panels A through C back into the manuscript in the interim revision.

      Reviewer #2 (Public review):

      Summary:

      This study aimed to uncover the protein composition and evolutionary conservation of electrical synapses in retinal neurons. The authors employed two complementary BioID approaches: expressing a Cx35b-TurboID fusion protein in zebrafish photoreceptors and using GFP-directed TurboID in Cx36-EGFP-labeled mouse AII amacrine cells. They identified conserved ZO proteins and endocytosis components in both species, along with over 50 novel proteins related to adhesion, cytoskeleton remodeling, membrane trafficking, and chemical synapses. Through a series of validation studies¬-including immunohistochemistry, in vitro interaction assays, and immunoprecipitation - they demonstrate that novel scaffold protein SIPA1L3 interacts with both Cx36 and ZO proteins at electrical synapse. Furthermore, they identify and localize proteins ZO-1, ZO-2, CGN, SIPA1L3, Syt4, SJ2BP, and BAI1 at AII/cone bipolar cell gap junctions.

      Strengths:

      The study demonstrates several significant strengths in both experimental design and validation approaches. First, the dual-species approach provides valuable insights into the evolutionary conservation of electrical synapse components across vertebrates. Second, the authors compare two different TurboID strategies in mice and demonstrate that the HKamac promoter and GFP-directed approach can successfully target the electrical synapse proteome of mouse AII amacrine cells. Third, they employed multiple complementary validation approaches - including retinal section immunohistochemistry, in vitro interaction assays, and immunoprecipitation-providing evidence supporting the presence and interaction of these proteins at electrical synapses.

      Weaknesses:

      The conclusions of this paper are supported by data; however, some aspects of the quantitative proteomics analysis require clarification and more detailed documented. The differential threshold criteria (>3 log2 fold for mouse vs >1 log2 fold for zebrafish) will benefit from biological justification, particularly given the cross-species comparison. Additionally, providing details on the number of biological or technical replicates used in this study, along with analyses of how these replicates compare to each other, would strengthen the confidence in the identification of candidate proteins. Furthermore, including negative controls for the histological validation of proteins interacting with Cx36 could increase the reliability of the staining results.

      While the study successfully characterized the presence of candidate proteins at the electrical synapses between AII amacrine cells and cone bipolar cells, it did not compare protein compositions between the different types of electrical synapses within the circuit. Given that AII amacrine cells form both homologous (AII-AII) and heterologous (AII-cone bipolar cell) electrical synapses-connections that serve distinct functional roles in retinal signaling processing-a comparative analysis of their molecular compositions could have provided important insights into synapse specificity.

      Reviewer #3 (Public review):

      Summary:

      This study by Tetenborg S et al. identifies proteins that are physically closely associated with gap junctions in retinal neurons of mice and zebrafish using BioID, a technique that labels and isolates proteins proximal to a protein of interest. These proteins include scaffold proteins, adhesion molecules, chemical synapse proteins, components of the endocytic machinery, and cytoskeleton-associated proteins. Using a combination of genetic tools and meticulously executed immunostaining, the authors further verified the colocalizations of some of the identified proteins with connexin-positive gap junctions. The findings in this study highlight the complexity of gap junctions. Electrical synapses are abundant in the nervous system, yet their regulatory mechanisms are far less understood than those of chemical synapses. This work will provide valuable information for future studies aiming to elucidate the regulatory mechanisms essential for the function of neural circuits.

      Strengths:

      A key strength of this work is the identification of novel gap junction-associated proteins in AII amacrine cells and photoreceptors using BioID in combination with various genetic tools. The well-studied functions of gap junctions in these neurons will facilitate future research into the functions of the identified proteins in regulating electrical synapses.

      Thank you for these comments.

      Weaknesses:

      I do not see major weaknesses in this paper. A minor point is that, although the immunostaining in this study is beautifully executed, the quantification to verify the colocalization of the identified proteins with gap junctions is missing. In particular, endocytosis component proteins are abundant in the IPL, making it unclear whether their colocalization with gap junction is above chance level (e.g. EPS15l1, HIP1R, SNAP91, ITSN in Figure 3B).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      By way of background, the Jiang lab has previously shown that loss of the type II BMP receptor Punt (Put) from intestinal progenitors (ISCs and EBs) caused them to differentiate into EBs, with a concomitant loss of ISCs (Tian and Jiang, eLife 2014). The mechanism by which this occurs was activation of Notch in Put-deficient progenitors. How Notch was upregulated in Put-deficient ISCs was not established in this prior work. In the current study, the authors test whether a very low level of Dl was responsible. But co-depletion of Dl and Put led to a similar phenotype as depletion of Put alone. This result suggested that Dl was not the mechanism. They next investigate genetic interactions between BMP signaling and Numb, an inhibitor of Notch signaling. Prior work from Bardin, Schweisguth and other labs has shown that Numb is not required for ISC self-renewal. However the authors wanted to know whether loss of both the BMP signal transducer Mad and Numb would cause ISC loss. This result was observed for RNAi depletion from progenitors and for mad, numb double mutant clones. Of note, ISC loss was observed in 40% of mad, numb double mutant clones, whereas 60% of these clones had an ISC. They then employed a two-color tracing system called RGT to look at the outcome of ISC divisions (asymmetric (ISC/EB) or symmetric (ISC/ISC or EB/EB)). Control clones had 69%, 15% and 16%, respectively, whereas mad, numb double mutant clones had much lower ISC/ISC (11%) and much higher EB/EB (37%). They conclude that loss of Numb in moderate BMP loss of function mutants increased symmetric differentiation which lead caused ISC loss. They also reported that numb<sup>15</sup> and numb<sup>4</sup> clones had a moderate but significant increase in ISC-lacking clones compared to control clones, supporting the model that Numb plays a role in ISC maintenance. Finally, they investigated the relevance of these observation during regeneration. After bleomycin treatment, there was a significant increase in ISC-lacking clones and a significant decrease in clone size in numb<sup>4</sup> and numb<sup>15</sup> clones compared to control clones. Because bleomycin treatment has been shown to cause variation in BMP ligand production, the authors interpret the numb clone under bleomycin results as demonstrating an essential role of Numb in ISC maintenance during regeneration.

      Strengths:

      (i) Most data is quantified with statistical analysis

      (ii) Experiments have appropriate controls and large numbers of samples

      (iii) Results demonstrate an important role of Numb in maintaining ISC number during regeneration and a genetic interaction between Mad and Numb during homeostasis.

      Weaknesses:

      (i) No quantification for Fig. 1

      Quantification of Fig.1 has been added. 

      (ii) The premise is a bit unclear. Under homeostasis, strong loss of BMP (Put) leads to loss of ISCs, presumably regardless of Numb level (which was not tested). But moderate loss of BMP (Mad) does not show ISC loss unless Numb is also reduced. I am confused as to why numb does not play a role in Put mutants. Did the authors test whether concomitant loss of Put and Numb leads to even more ISC loss than Put-mutation alone.

      We have tested the genetic interaction between put and numb using Put RNAi and Numb RNAi driven by esg<sup>ts</sup>. According to the results in this study and our previously published data, put mutant clone or esg<sup>ts</sup> > Put-RNAi induced a rapid loss of ISC (whin 8 days). We did not observe further enhancement of stem cell loss phenotype in Put and Numb double RNAi guts.

      (iii) I think that the use of the word "essential" is a bit strong here. Numb plays an important role but in either during homeostasis or regeneration, most numb clones or mad, numb double mutant clones still have ISCs. Therefore, I think that the authors should temper their language about the role of Numb in ISC maintenance.

      We have revised the language and changed “essential” to important”.

      Reviewer #2 (Public review):

      Summary:

      This work assesses the genetic interaction between the Bmp signaling pathway and the factor Numb, which can inhibit Notch signalling. It follows up on the previous studies of the group (Tian, Elife, 2014; Tian, PNAS, 2014) regarding BMP signaling in controlling stem cell fate decision as well as on the work of another group (Sallé, EMBO, 2017) that investigated the function of Numb on enteroendocrine fate in the midgut. This is an important study providing evidence of a Numb-mediated back up mechanism for stem cell maintenance.

      Strengths:

      (1) Experiments are consistent with these previous publications while also extending our understanding of how Numb functions in the ISC.

      (2) Provides an interesting model of a "back up" protection mechanism for ISC maintenance.

      Weaknesses:

      (1) Aspects of the experiments could be better controlled or annotated:

      (a) As they "randomly chose" the regions analyzed, it would be better to have all from a defined region (R4 or R2, for example) or to at least note the region as there are important regional differences for some aspects of midgut biology.

      Thank you for the suggestion. In fact, we conducted all the analyses in region 4, we have added statement to clarify this in the revised manuscript.

      (b) It is not clear to me why MARCM clones were induced and then flies grown at 18{degree sign}C? It would help to explain why they used this unconventional protocol.

      We kept the flies at 18°C to avoid spontaneous clone.

      (2) There are technical limitations with trying to conclude from double-knockdown experiments in the ISC lineage, such as those in Figure 1 where Dl and put are both being knocked down: depending on how fast both proteins are depleted, it may be that only one of them (put, for example) is inactivated and affects the fate decision prior to the other one (Dl) being depleted. Therefore, it is difficult to definitively conclude that the decision is independent of Dl ligand.

      In our hand, Dl-RNAi is very effective and exhibited loss of N pathway activity (as determined by the N pathway reporter Su(H)-lacZ ) after RNAi for 8 days (Fig. 1D). Therefore, the ectopic Su(H)-lacZ expression in Punt Dl double RNAi (fig. 1E) is unlikely due to residual Dl expression. Nevertheless, we have changed the statement “BMP signaling blocks ligand-independent N activity” to” Loss of BMP signaling results in ectopic N pathway activity even when Dl is depleted”

      (3) Additional quantification of many phenotypes would be desired.

      (a) It would be useful to see esg-GFP cells/total cells and not just field as the density might change (2E for example).

      We focused on R4 region for quantification where the cell density did not exhibit apparent change in different experimental groups. In addition, we have examined many guts for quantification. It is very unlikely that the difference in the esg-GFP+ cell number is caused by change in cell density.

      (b) Similarly, for 2F and 2G, it would be nice to see the % of ISC/ total cell and EB/total cell and not only per esgGFP+ cell.

      Unfortunately, we didn’t have the suggested quantification. However, we believe that quantification of the percentage of ISC or EB among all progenitor cells, as we did here, provides a meaningful measurement of the self-renewal status of each experimental group.

      (c) Fig1: There is no quantification - specifically it would be interesting to know how many esg+ are su(H)lacZ positive in Put- Dl- condition compared to WT or Put- alone. What is the n?

      Quantification of Fig.1 has been added. 

      (d) Fig2: Pros + cells are not seen in the image? Are they all DllacZ+?

      Anti-Pros and anti-E(spl)mβ-CD2 were stained in the same channel (magenta).  Pros+ exhibited “dot-like” nuclear staining while CD2 staining outlined the cell membrane of EBs. We have clarified this in the revised figure legend.

      (e) Fig3: it would be nice to have the size clone quantification instead of the distribution between groups of 2 cell 3 cells 4 cell clones.

      Because of the heterogeneity of clone size for each genotype, we chose to group clones based on their sizes ( 2, 3-6, 6-8, >8 cells) and quantified the distribution of individual groups for each genotype, which clearly showed an overall reduction in clone size for mad numb double mutant clones. We and others have used the same clone size analysis in previous studies (e.g., Tian and Jiang, eLife 2014).

      (f) How many times were experiments performed?

      All experiments were performed at least 3 times.

      (4) The authors do not comment on the reduction of clone size in DSS treatment in Figure 6K. How do they interpret this? Does it conflict with their model of Bleo vs DSS?

      Guts containing numb<sup>4</sup> clones treated with DSS exhibited a slight reduction of clone size, evident by a higher percentage of 2-cell clones and lower percentage of > 8 cell clones. This reduction is less significant in guts containing numb<sup>15</sup> clones. However, the percentage of Dl<sup>+</sup>-containing clones is similar between DSS and mock-treated guts. It is possible that ISC proliferation is lightly reduced due to numb<sup>4</sup> mutation or the genetic background of this stock.

      (5) There is probably a mistake on sentence line 314 -316 "Indeed, previous studies indicate that endogenous Numb was not undetectable by Numb antibodies that could detect Numb expression in the nervous system".

      We have modified the sentence.

      Reviewer #3 (Public review):

      Summary:

      The authors provide an in-depth analysis of the function of Numb in adult Drosophila midgut. Based on RNAi combinations and double mutant clonal analyses, they propose that Numb has a function in inhibiting Notch pathway to maintain intestinal stem cells, and is a backup mechanism with BMP pathway in maintaining midgut stem cell mediated homeostasis.

      Strengths:

      Overall, this is a carefully constructed series of experiments, and the results and statistical analyses provides believable evidence that Numb has a role, albeit weak compared to other pathways, in sustaining ISC and in promoting regeneration especially after damage by bleomycin, which may damage enterocytes and therefore disrupt BMP pathway more. The results overall support their claim.

      The data are highly coherent, and support a genetic function of Numb, in collaborating with BMP signaling, to maintain the number and proliferative function of ISCs in adult midguts. The authors used appropriate and sophisticated genetic tools of double RNAi, mutant clonal analysis and dual marker stem cell tracing approaches to ensure the results are reproducible and consistent. The statistical analyses provide confidence that the phenotypic changes are reliable albeit weaker than many other mutants previously studied.

      Weaknesses:

      In the absence of Numb itself, the midgut has a weak reduction of ISC number (Fig. 3 and 5), as well as weak albeit not statistically significant reduction of ISC clone size/proliferation. I think the authors published similar experiments with BMP pathway mutants. The mad<sup>1-2</sup> allele used here as stated below may not be very representative of other BMP pathway mutants. Therefore, it could be beneficial to compare the number of ISC number and clone sizes between other BMP experiments to provide the readers with a clearer picture of how these two pathways individually contribute (stronger/weaker effects) to the ISC number and gut homeostasis.

      Thanks for the comment. We have tested other components of BMP pathway in our previously study (Tian et al., 2014). More complete loss of BMP signaling (for example, Put clones, Put RNAi, Tkv/Sax double mutant clones or double RNAi) resulted in ISC loss regardless the status of numb, suggesting a more predominant role of BMP signaling in ISC self-renewal compared with Numb. We speculate that the weak stem cell loss phenotype associated with numb mutant clones in otherwise wild type background could be due to fluctuation of BMP signaling in homeostatic guts.

      The main weakness of this manuscript is the analysis of the BMP pathway components, especially the mad<sup>1-2</sup> allele. The mad RNAi and mad<sup>1-2</sup> alleles (P insertion) are supposed to be weak alleles and that might be suitable for genetic enhancement assays here together with numb RNAi. However, the mad<sup>1-2</sup> allele, and sometimes the mad RNAi, showed weakly increased ISC clone size. This is kind of counter-intuitive that they should have a similar ISC loss and ISC clone size reduction.

      We used mad<sup>1-2</sup> and mad RNAi here to test the genetic interaction with numb because our previous studies showed that partial loss of BMP signaling under these conditions did not cause stem cell loss, therefore, may provide a sensitized background to determine the role of Numb in ISC self-renewal. The increased proliferation of ISC/ clone size associated with mad<sup>1-2</sup> and mad RNAi is due to the fact that reduction of BMP signaling in either EC or EB non-autonomously induces stem cell proliferation. However, in mad numb double mutant clones, there was a reduction in clone size due to loss of ISC in many clones.

      A much stronger phenotype was observed when numb mutants were subject to treatment of tissue damaging agents Bleomycin, which causes damage in different ways than DSS. Bleomycin as previously shown to be causing mainly enterocyte damage, and therefore disrupt BMP signaling from ECs more likely. Therefore, this treatment together with loss of numb led to a highly significant reduction of ISC in clones and reduction of clone size/proliferation. One improvement is that it is not clear whether the authors discussed the nature of the two numb mutant alleles used in this study and the comparison to the strength of the RNAi allele. Because the phenotypes are weak and more variable, the use of specific reagents is important.

      We have included information about the two numb alleles in the “Materials and Methods”. numb<sup>15</sup> is a null allele, and the nature of numb<sup>4</sup> has not been elucidated. According to Domingos, P.M. et al., numb<sup>15</sup> induced a more severe phenotype than numb<sup>4</sup> did. Consistently, we also found that more numb<sup>15</sup> mutant clones were void of stem cell than numb<sup>4</sup> mutant clones.

      Furthermore, the use of possible activating alleles of either or both pathways to test genetic enhancement or synergistic activation will provide strong support for the claims.

      Activation of BMP (esgts>Tkv<sup>CA</sup>) alone induced stem cell tumor (Tian et al., 2014) whereas overexpression of Numb did not induce increase stem cell number although overexpression of Numb in wing discs produced phenotypes indictive of inhibition of N (our unpublished observation), making it difficult to test the synergistic effect of activating both BMP and Numb.

      Reviewer #1 (Recommendations for the authors):

      - Cartoon of RGT in Fig 4 needs to be improved. We need to know what chromosome harbors the esgts. It is not sufficient to simply put the location of the ubi-GFP and ubi-RFP (on 19A) and not show the location of other components of the RGT system.

      Thank you for the suggestion. We have revised the cartoon in Fig. 4 to include all three pairs of chromosomes and indicate where the esgts driver and UAS-RNAi are located. In addition, we have included the genotypes for all the genetic experiments in the Method section.

      - Quantification of the results in Fig. 1

      Quantification of Fig.1 has been added. 

      - The authors need to explain the premise more carefully (see above) and explain whether or not they tested put, numb double knockdowns.

      We have explained why not testing put numb double RNAi (see above).

      Reviewer #2 (Recommendations for the authors):

      The number of times the experiments have been performed would be useful to include.

      This information has been added in the figure legends.

    1. Author response:

      We thank the reviewers for their thoughtful comments on our submitted manuscript.

      The major point from all three reviewers was that the sensory inputs may be more complex than simply ASH and AWC, since mutations in osm-9 and tax-4 will affect many more sensory neurons. We fully agree. The differential effects of osm-9 and ta_x-_4 allowed us to recognize that there were two distinct afferent pathways operating simultaneously, mediating repulsion and attraction separately. However, it remains to be determined which sensory neurons are contributing to each pathway. We have planned a full analysis of the sensory inputs, not limited to just ASH and AWC, using neuron-specific rescue and neuron-specific chemogenetic inactivation (using HisCl1). While this analysis falls outside the scope of the present study, we will perform the inactivations of ASH and AWC and include the data for the revised version of this study. We expect to demonstrate whether ASH and AWC inputs are sufficient or whether other sensory neurons make significant contributions. Additionally, we will include chemotaxis dose-response data for osm-9 mutants as part of this analysis and make the minor corrections in data presentation requested.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We are disappointed that the reviewers do not acknowledge that our data constitute a major step forward for the field. We will prepare a revised version that takes care of the remaining small issues concerning the technical descriptions and a detailed response to the current round of comments. We will also add a summary of the major new findings of our study.


      The following is the authors’ response to the original reviews.

      We appreciate the time of the reviewers and their detailed comments, which have helped to improve the manuscript.

      Our study presents the largest systematic dataset so far on the evolution of sex-biased gene expression in animals. It is also the first that explores the patterns of individual variation in sex-biased gene expression and the SBI is an entirely new procedure to directly visulize these variance patterns in an intuitive way.

      Also, we should like to point out that our study contradicts recent conclusions that had suggested that a substantial set of sex-biased genes has conserved functions between humans and mice and that mice can therefore be informative for gender-specific medicine studies. Our data suggest that only a very small set of genes are conserved in their sex-biased expression between mice and humans in more than one organ.

      In the revised version we have made the following major updates:

      - added a rate comparison of gene regulation turnover between sex-biased and non-sex-biased genes

      - added additional statistics to the variance comparisons and selection tests

      - added a regulatory module analysis that shows that much of the gene turnover happens within modules

      - added a mosaic pattern analysis that shows the individual complexity of sex-biased patterns

      - extended introduction and discussion

      Reviewer #1 (Public Review):<br /> The authors describe a comprehensive analysis of sex-biased expression across multiple tissues and species of mouse. Their results are broadly consistent with previous work, and their methods are robust, as the large volume of work in this area has converged toward a standardized approach.

      I have a few quibbles with the findings, and the main novelty here is the rapid evolution of sex-biased expression over shorter evolutionary intervals than previously documented, although this is not statistically supported. The other main findings, detailed below, are somewhat overstated.

      (1) In the introduction, the authors conflate gametic sex, which is indeed largely binary (with small sperm, large eggs, no intermediate gametic form, and no overlap in size) with somatic sexual dimorphism, which can be bimodal (though sometimes is even more complicated), with a large variance in either sex and generally with a great deal of overlap between males and females. A good appraisal of this distinction is at . This distinction in gene expression has been recognized for at least 20 years, with observations that sex-biased expression in the soma is far less than in the gonad.

      For example, the authors frame their work with the following statement:

      "The different organs show a large individual variation in sex-biased gene expression, making it impossible to classify individuals in simple binary terms. Hence, the seemingly strong conservation of binary sex-states does not find an equivalent underpinning when one looks at the gene-expression makeup of the sexes"

      The authors use this conflation to set up a straw man argument, perhaps in part due to recent political discussions on this topic. They seem to be implying one of two things. a) That previous studies of sex-biased expression of the soma claim a binary classification. I know of no such claim, and many have clearly shown quite the opposite, particularly studies of intra-sexual variation, which are common - see https://doi.org/10.1093/molbev/msx293, https://doi.org/10.1371/journal.pgen.1003697, https://doi.org/10.1111/mec.14408, https://doi.org/10.1111/mec.13919, https://doi.org/10.1111/j.1558-5646.2010.01106.x for just a few examples. Or b) They are the first to observe this non-binary pattern for the soma, but again, many have observed this. For example, many have noted that reproductive or gonad transcriptome data cluster first by sex, but somatic tissue clusters first by species or tissue, then by sex (https://doi.org/10.1073/pnas.1501339112, https://doi.org/10.7554/eLife.67485)

      Figure 4 illustrates the conceptual difference between bimodal and binary sexual conceptions. This figure makes it clear that males and females have different means, but in all cases the distributions are bimodal.

      I would suggest that the authors heavily revise the paper with this more nuanced understanding of the literature and sex differences in their paper, and place their findings in the context of previous work.

      We are sorry that our introduction seems to have been too short to make our points sufficiently clear. Of course, overlapping somatic variation has been shown for morphological characters, but we were aiming to assess this at the sex-biased transcriptome level. Previous studies looking at sex-biased genes were usually limited by the techniques that were available at their times, resulting in a focus on gonads in most studies and almost all have too few individuals included to study within-group variation. We detail this below for the papers that are mentioned by the referee. In view of this, we cite them now as examples for the prevalent focus on gonadal comparisons in most studies. Only Scharmann et al. 2021 on plant leaf dimorphism is indeed relevant for our study with respect to its general findings and we make now extensive reference to it. In addition, we have generally modified the introduction and substantially extended the discussion to make our points clear.

      Snell-Rood 2010: the paper focuses on sex-specific morphological structures in beetles. It samples six somatic tissues for four individuals each of each class. Analysis is done via microarray hybridizations. While categorial differences were traced, variability between individuals was not discussed. By today´s standards, microarrays have anyway too much technical variability to even consider such a discussion.

      Pointer et al. 2013: this paper studies three sexual phenotypes in a bird species, females, dominant males and subordinate males. Tissues include telencephalon, spleen and left gonad. The focus of the analysis is on the gonads, since only few sex-biased genes were found in spleen and brain (according to suppl. Table S1, 0 for the spleen and 2 for the brain). No inferences could be made on somatic variation.

      Harrison 2015: this paper focuses on gonads plus spleen in six bird species with between 2-6 individuals for each sex collected. In the spleen, only one female biased gene and no male biased gene was detected. Hence, the data do not allow to infer patterns of somatic variation.

      Dean et al. 2016: this paper compares four categories of fish caught around nests, with four to seven individuals per category. Only gonads were analyzed, hence no inferences could be made about somatic variability between individuals.

      Cardoso et al. 2017: this paper test categories of fish with alternative reproductive tactics based on brain transcriptomes. While it uses 9-10 individuals per category, it uses pools for sequencing with two pools per category. This does not allow to make any inference on individual variation.

      Todd et al 2017: this paper focuses on three categories of a fish species, females and dominant and sneaker males. It uses brain and gonads as samples with five individuals each for each category. For the brain, more different genes were found between the two types of males, rather than between females and males (3 and 9 respectively). The paper focuses on individual gene descriptions and does not mention somatic variation.

      Scharmann 2021: the paper focuses on 10 species of plants with sexually dimorphic leafs. 5-6 individuals were sampled per sex. The major finding is that sex-biased gene expression does not correlate with the degree of sexual dimorphism of the leafes. The study shows also a fast evolution of sex-biased expression and states that signatures of adaptive evolution are weak. But it does not discuss variance patterns within populations.

      (2) The authors also claim that "sexual conflict is one of the major drivers of evolutionary divergence already at the early species divergence level." However, making the connection between sex-biased genes and sexual conflict remains fraught. Although it is tempting to use sex-biased gene expression (or any form of phenotypic dimorphism) as an indicator of sexual conflict, resolved or not, as many have pointed out, one needs measures of sex-specific selection, ideally fitness, to make this case (https://doi.org/10.1086/595841, 10.1101/cshperspect.a017632). In many cases, sexual dimorphism can arise in one sex only without conflict (e.g. 10.1098/rspb.2010.2220). As such, sex-biased genes alone are not sufficient to discriminate between ongoing and resolved conflict.

      We imply sexual conflict as a driver of genomic divergence patterns in a similar way as it has been done by many authors before (e.g. Mank 2017a, Price et al. 2023, Tosto et al. 2023). While we fully appreciate the point of the referee, we do not really see where we deviate from the standard wording that is used in the context of genomic data. In such data, it is of course usually assumed that they represent solved conflicts (Figure 1D in Cox and Calsbeek) where selection differentials would not be measurable anyway. (Please note also that the phylogenetic approach used in Oliver and Monteiro 2010 becomes rather problematic in view of introgressive hybridization patterns in butterflies), We have extended the discussion to address this.

      (3) To make the case that sex-biased genes are under selection, the authors report alpha values in Figure 3B. Alpha value comparisons like this over large numbers of genes often have high variance. Are any of the values for male- female- and un-biased genes significantly different from one another? This is needed to make the claim of positive selection.

      Sorry, we had accidentally not included the statistics in the final version of the figure. We have added this now in the supplementary table but have also generally changed the statistical approach and the design of the figure.

      Reviewer #2 (Public Review):

      The manuscript by Xie and colleagues presents transcriptomic experiments that measure gene expression in eight different tissues taken from adult female and male mice from four species. These data are used to make inferences regarding the evolution of sex-biased gene expression across these taxa. The experimental methods and data analysis are appropriate; however, most of the conclusions drawn in the manuscript have either been previously reported in the literature or are not fully supported by the data.

      We are not aware of any study that has analyzed somatic sex-biased expression in such a large and taxonomically well resolved closely related taxa of animals. Only the study by Scharman et al. 2021 on plant leaves comes close to it, but even this did not specifically analyze the intragroup variation aspects. Of course, some of our results confirm previous conclusions, but we should still like to point out that they go far beyond them.

      There are two ways the manuscript could be modified to better strengthen the conclusions.

      First, some of the observed differences in gene expression have very little to no effect on other phenotypes, and are not relevant to medicine or fitness. Selectively neutral gene expression differences have been inferred in previous studies, and consistent with that work, sex-biased and between-species expression differences in this study may also be enriched for selectively neutral expression differences. This idea is supported by the analysis of expression variance, which indicates that genes that show sex-biased expression also tend to show more inter-individual variation. This perspective is also supported by the MK analysis of molecular evolution, which suggests that positive selection is more prevalent among genes that are sex-biased in both mus and dom, and genes that switch sex-biased expression are under less selection at the level of both protein-coding sequence and gene expression.

      We have now revisited these points by additional statistical analysis of the variance patterns and an extended discussion under the heading "Neutral or adaptive?". 

      As an aside, I was confused by (line 176): "implying that the enhanced positive selection pressure is triggered by their status of being sex-biased in either taxon." - don't the MK values suggest an excess of positive selection on genes that are sex-biased in both taxa?

      There are different sets of genes that are sex-biased in these two taxa - hence this observation is actually a strong argument for selection on these genes. We have changed the correspondiung text to make this clearer.

      Without an estimate of the proportion of differentially expressed genes that might be relevant for broader physiological or organismal phenotypes, it is difficult to assess the accuracy and relevance of the manuscript's conclusions. One (crude) approach would be to analyze subsets of genes stratified by the magnitude of expression differences; while there is a weak relationship between expression differences and fitness effects, on average large gene expression differences are more likely to affect additional phenotypes than small expression differences.

      We agree that it remains a challenge to show functional effects for the sex-biased genes. The argument that they should have a function is laid out above (and stated in many reviews on the topic). To use the expression level as a proxy of function does not seem justified, given the current literature. For example, genes that are highly conected in modules are not necessrily highly expressed (e.g. transcription factors). Also, genes may be highly expressed in a rare cell type of an organ and have an important funtion there, but this would not show up across the RNA of the whole organ. The most direct functional relationship between sex-biased expression and phenotype comes from the human data in Naqvi et al. 2019 - which we had cited.

      Another perspective would be to compare the within-species variance to the between-species variance to identify genes with an excess of the latter relative to the former (similar logic to an MK test of amino acid substitutions).

      Such an analysis was actually our intial motivation for this study. However, the new (and surprising!) result is that the status of being sex-biased shows such a high turnover that not many genes are left per organ where one could even try to make such a test. However, we have extended the variance analysis with reciprocal gene sets (as we had done it for the MK test) and extended the discussion on the topic, including citation of our prior work on these questions.

      Second, the analysis could be more informative if it distinguished between genes that are expressed across multiple tissues in both sexes that may show greater expression in one sex than the other, versus genes with specialized function expressed solely in (usually) reproductive tissues of one sex (e.g. ovary-specific genes). One approach to quantify this distinction would be metrics like those used defined by [Yanai I, et al. 2005. Genome-wide midrange transcription profiles reveal expression-level relationships in human tissue specification. Bioinformatics 21:650-659.] These approaches can be used to separate out groups of genes by the extent to which they are expressed in both sexes versus genes that are primarily expressed in sex-specific tissue such as testes or ovaries. This more fine-grained analysis would also potentially inform the section describing the evolution/conservation of sex-biased expression: I expect there must be genes with conserved expression specifically in ovaries or testes (these are ancient animal structures!) but these may have been excluded by the requirement that genes be sex-biased and expressed in at least two organs.

      Given that our study focuses on somatic sex-biased genes, we refrain from a comparative analysis of genes that are only expressed in the sex-organs in this paper. With respect to sharing of sex-biased gene expresssion between the somatic tissues, we show in Figure 8 that there are only very few of them (8 female-biased and 3 male-biased). A separate statistical treatment is not possible for this small set of genes.

      There are at least three examples of statements in the discussion that at the moment misinterpret the experimental results.

      The discussion frames the results in the context of sexual selection and sexually antagonistic selection, but these concepts are not synonymous. Sexual selection can shape phenotypes that are specific to one sex, causing no antagonism; and fitness differences between males and females resulting from sexually antagonistic variation in somatic phenotypes may not be acted on by sexual selection. Furthermore, the conditions promoting and consequence of both kinds of selection can be different, so they should be treated separately for the purposes of this discussion.

      We cannot make such a distinction for gene expression patterns - and we are not aware that this was done before in the literature (except gene expression was directly linked to a morphological structure). We have updated this discussion accordingly.

      The discussion claims that "Our data show that sex-biased gene expression evolves extremely fast" but a comparison or expectation for the rate of evolution is not provided. Many other studies have used comparative transcriptomics to estimate rates of gene expression evolution between species, including mice; are the results here substantially and significantly different from those previous studies? Furthermore, the experimental design does not distinguish between those gene expression phenotypes that are fixed between species as compared to those that are polymorphic within one or more species which prevents straightforward interpretation of differences in gene expression as interspecific differences.

      Our statement was in relation to the comparison between somatic and gondadal gene turnover, as well as the comparison to humans. We have now included an additional analysis for a direct comparison with non-sex-biased genes in the same populations (Figure 2B). Note that gene expression variances cannot get fixed anyway, they can only become different in average and magnitude.

      The conclusion that "Our results show that most of the genetic underpinnings of sex differences show no long-term evolutionary stability, which is in strong contrast to the perceived evolutionary stability of two sexes" - seems beyond the scope of this study. This manuscript does not address the genetic underpinnings of sex differences (this would involve eQTL or the like), rather it looks at sex differences in gene expression phenotypes.

      This comes back to the points discussed above about the validity to infer function from sex-biased expression. We have updated the text to clarify this.

      Simply addressing the question of phenotypic evolutionary stability would be more informative if genes expressed specifically in reproductive tissues were separated from somatic sex-biased genes to determine if they show similar patterns of expression evolution.

      Our study is generally focused on somatic gene expression. The comparison with reproductive tissues serves merely as a reference. Since they are of course very different tissues, they should not be compared with each other in the same way. We have now specifically addressed this point in the discussion.

      Reviewer #3 (Public Review):

      This manuscript reports some interesting and important patterns. The results on sex-bias in different tissues and across four taxa would benefit from alternative (or additional) presentation styles. In my view, the most important results are with respect to alpha (fraction of beneficial amino acid changes) in relation to sex-bias (though the authors have made this as a somewhat minor point in this version).

      The part that the authors emphasize I don't find very interesting (i.e., the sexes have overlapping expression profiles in many nongonadal tissues), nor do I believe they have the appropriate data necessary to convincingly demonstrate this (which would require multiple measures from the same individual).

      This is the first study that reports such overlaps and we show that this is not always the case (e.g. liver and kidney data in mice). We are not aware of any preditions of how such patterns would look like and how they would evolve - why should such a new finding not be interesting? Concerning the appropriateness of the data we do not agree with the point the referee makes - see response below.

      This study reports several interesting patterns with respect to sex differences in gene expression across organs of four mice taxa. An alternative presentation of the data would yield a clearer and more convincing case that the patterns the authors claim are legitimate.

      I recommend that the authors clarify what qualifies as "sex-bias".

      This is defined by the statistical criteria that we have applied, following the general standard of papers on this topic.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) "However, already Darwin has pointed out that the phenotypes of the sexes should evolve fast". I think the authors mean that Darwin was quick to point out that sex-specific phenotypes evolve quickly".

      We have modified this text part.

      (2) Non-gonadal is more often referred to as somatic. I would encourage the authors to use this more common term for accessibility.

      We have adopted this term

      (3) Figure 5 is interesting, however, it is difficult to know whether the decreased bimodality in humans compared to mice is biological or technical due to the differences in the underlying data. For example, the mouse samples tightly controlled age and environmental conditions within each species. It is not possible to do that with human samples, and there are very good reasons to think that these factors will affect variance in both sexes.

      Yes, this is certainly true and we know this also from other comparative data between mice and humans. Still, this is human reality vs mouse artificialness. We pick this now up in the discussion.

      (4) Line 273. The large numbers of cells needed for single-cell analysis require that most studies pool multiple samples, however these pools are helpful in themselves. This approach was used by https://doi.org/10.1093/evlett/qrad013 to quantify the degree of sex-bias within cell types across multiple tissues and to compare how bulk and single-cell sex-bias measures compare. Sex-bias in some somatic cell types was very high, even when bulk sex-bias in those tissues was not. This suggests that the bulk data the authors use in this study may in fact obscure the pattern of sex-bias.

      Yes, we agree, and this is exactly how we did the analysis and interpretation, based on the cited paper.

      (5)- Line 379 "Total RNAs were" should be "Total RNA was"

      Corrected

      References cited in this review and which should be included in the manuscript :

      Sam L Sharpe, Andrew P Anderson, Idelle Cooper, Timothy Y James, Alexandra E Kralick, Hans Lindahl, Sara E Lipshutz, J F McLaughlin, Banu Subramaniam, Alicia Roth Weigel, A Kelsey Lewis, Sex and Biology: Broader Impacts Beyond the Binary, Integrative, and Comparative Biology, Volume 63, Issue 4, October 2023, Pages 960-967.

      Included

      Masculinization of Gene Expression Is Associated with Exaggeration of Male Sexual Dimorphism Pointer MA, Harrison PW, Wright AE, Mank JE (2013) Masculinization of Gene Expression Is Associated with Exaggeration of Male Sexual Dimorphism. PLOS Genetics 9(8): e1003697.

      Included

      Erica V Todd, Hui Liu, Melissa S Lamm, Jodi T Thomas, Kim Rutherford, Kelly C Thompson, John R Godwin, Neil J Gemmell, Female Mimicry by Sneaker Males Has a Transcriptomic Signature in Both the Brain and the Gonad in a Sex-Changing Fish, Molecular Biology and Evolution, Volume 35, Issue 1, January 2018, Pages 225-241.

      Included

      Cardoso SD, Gonçalves D, Goesmann A, Canário AVM, Oliveira RF. Temporal variation in brain transcriptome is associated with the expression of female mimicry as a sequential male alternative reproductive tactic in fish. Mol Ecol. 2018; 27: 789-803.

      Included

      Dean, R., Wright, A.E., Marsh-Rollo, S.E., Nugent, B.M., Alonzo, S.H. and Mank, J.E. (2017), Sperm competition shapes gene expression and sequence evolution in the ocellated wrasse. Mol Ecol, 26: 505-518.

      Included

      Emilie C. Snell‐Rood, Amy Cash, Mira V. Han, Teiya Kijimoto, Justen Andrews, Armin P. Moczek, DEVELOPMENTAL DECOUPLING OF ALTERNATIVE PHENOTYPES: INSIGHTS FROM THE TRANSCRIPTOMES OF HORN‐POLYPHENIC BEETLES, Evolution, Volume 65, Issue 1, 1 January 2011.

      Not included, since its technical approach is not really comparable

      Harrison PW, Wright AE, Zimmer F, Dean R, Montgomery SH, Pointer MA, Mank JE (2015) Sexual selection drives evolution and rapid turnover of male gene expression. Proceedings of the National Academy of Sciences, USA 112: 4393-4398.

      Included

      Mathias Scharmann, Anthony G Rebelo, John R Pannell (2021) High rates of evolution preceded shifts to sex-biased gene expression in Leucadendron, the most sexually dimorphic angiosperms eLife 10:e67485.

      Included

      Sexually Antagonistic Selection, Sexual Dimorphism, and the Resolution of Intralocus Sexual Conflict. Robert M. Cox and Ryan Calsbeek , The American Naturalist 2009 173:2, 176-187.

      Included

      Ingleby FC, Flis I, Morrow EH. Sex-biased gene expression and sexual conflict throughout development. Cold Spring Harb Perspect Biol. 2014 Nov 6;7(1):a017632.

      Included

      Oliver JC, Monteiro A 2011. On the origins of sexual dimorphism in butterflies. Proc Biol Sci 278: 1981-1988.

      Included

      Iulia Darolti, Judith E Mank, Sex-biased gene expression at single-cell resolution: cause and consequence of sexual dimorphism, Evolution Letters, Volume 7, Issue 3, June 2023, Pages 148-156.

      Included

      Reviewer #2 (Recommendations For The Authors):

      I am concerned the smoothed density plots in Figure 4 may be providing a misleading sense of the distributions since each distribution is inferred from only 9 values. A boxplot might better represent the data to the reader.

      Boxplots with 9 values are much more difficult to interpret for a reader, this is the very reason why one tends to smoothen them. In this way, they also become similar to the standard plots that are used for showing morphological variation between the sexes. Note that the original data are availble for the individual values, if these are of special interest in some cases. In addition, our new “mosaic” analysis (Figure 6) provides another presentation for readers.

      Line 235: "the overall numbers are lower" I assume this is the number of genes included in the analyses, but this should be explicitly stated.

      Clarified in the text

      The analysis of gene expression from different brain regions in control individuals from the Alzheimer's study (line 273) suffers from low power and it is not clear to me how much taking samples from different brain regions eliminates the issue of different cell types within a sample (the stated motivation for this analysis). While I support publishing negative results, this section does not feel like it adds much to the manuscript and could be cut in my opinion.

      This is actually a study on single cell types, differentiating each of them. We are sorry that the text was apparently unclear about this. Given that there are studies that show the importance of looking at single cell data, we still think that is a suitable analysis. We have updated the text to make it clearer.

      It might be useful to separate out X-linked genes from autosomal genes to see if they show consistent patterns with regard to sex-bias.

      We have added this information in suppl. Table S2 and include some description in the text.

      Reviewer #3 (Recommendations For The Authors):

      Comments follow the order of the Results section:

      (1) The latter half of this line in the Methods is too vague to be helpful: "We have explored a range of cutoffs and found that a sex-bias ratio of 1.25-fold difference of MEDIAN expression values combined with a Wilcoxon rank sum test and Benjamini-Hochberg FDR correction (using FDR <0.1 as cutoff) (Benjamini & Hochberg, 1995) yields the best compromise between sensitivity and specificity". What precisely is meant by "the best compromise between sensitivity and specificity"?

      We explain now that this was based on pre-tests with comparing randomized with actual data. However, we agree that this is in the end a subjective decision, but there is no single standard used in the literature, especially when somatic organs are included. We consider our criteria as rather stringent.

      (2) The 1.25 number for sex bias is, ultimately, an arbitrary cut-off. It is common in this literature to choose some arbitrary level and, in this sense, the authors are following common practice. The choice of 1.25 should be stated in the main text as it is a lower (but not reasonable) value than has been used in many other papers.

      It is not only the cutoff, but also the Wilcoxon test and FDR correction that defines the threshold. See also comment above.

      (3) In truth, dimorphism is continuous rather than discrete (i.e, greater or less than 1.25 fold different). Thus, where possible it would be useful to present results in a fashion that allows readers to see the continuous range of ratios rather than having to worry about whether the patterns are due to the rather arbitrary choices of how genes were binned into sex-bias categories.

      It is necessary to work with cutoffs in such cases - and this is the usual practice for any such paper. But we provide now in Figure 1 Figure supplement 1 plots with the female/male ratio distributions.

      a) Number of genes that are female- / male-biased. I would like to be able to see a version of Figure 1 showing the full distribution of TPM ratios rather than bar graphs of the numbers of (arbitrarily defined) female- and male-biased genes. This will be, of course, a larger figure (a full distribution rather than 2 bars for each species for each organ) and so could be relegated to Supplementary Material (assuming the message of that figure is the same as the current Figure 1).

      This is a very unusual request, given that no other paper has done this either. It would indeed result in a non-managable figure size, or many separate figures that would be difficult to scrutinize. Note that there would be one plot of two (female and male) TPM distributions for each sex-biased gene in each organ and each taxon, leading to hundreds of thousands of plots. We think that by providing the general distributions as plots (see above), and the original data as supplements is sufficient.

      b) Turnover of genes with sex bias. This important issue is addressed in Figure 2. First, it is not precisely clear what "percentages of sums of shared genes for any pairwise comparison" in Figure 2 legend means and no further detail is given in the Methods; this must be made clearer or the info in Figure 2 is meaningless. Regardless, this approach again relies heavily on the arbitrary criterion of defining sex-bias. Thus, I would like to see correlation plots of the log(TPM ratio) between taxa as done in the classic multispecies fly paper of Zhang et al. 2007. In Figure 2 it is quite clear that male-biased genes evolve with respect to sex bias more rapidly than female-biased genes.

      We have provided a better explanation of this analysis. Note that the Zhang et al. 2007 paper was not focussing on somatic expression and covers a much broader evolutionary spectrum. Hence, the results are not comparable. Also, we doubt that it would be so helpful to generate a huge figure with all these plots.

      (4) Is there a simpler explanation for the results in the "Variance patterns" section? The total variance for any variable can be decomposed into the variance within and among "groups". If we use "sex" as the group, then there are genes - labelled sex-biased genes - that were identified as such, in essence, because they have high among-group variance. Given that we then know a priori at the start of this section of sex-biased genes have high among-group variance, is it at all surprising that they have higher total variance than the unbiased genes (which we know a priori have low among-group variance)? Perhaps I misunderstood the point of this section. Maybe it would be more meaningful to examine the WITHIN-SEX variance (averaged across the two sexes) instead.

      We did calculate IQR/median (“normalized variance”) with the nine mice for each gene and each sex in each organ, hence sex is not a variance factor in this calculation. The algorithm steps are outlined in suppl. Table S17. We have now also added a variance calculation for reciprocal gene sets and added an extended discussion of these results.

      (5) Analysis of alpha for sex-biased genes. This was the most interesting part of this manuscript to me.

      (a) More information about what SNVs were used is required.

      i. Were only sites where SPR was fixed used? (If not, how was polarization done?)

      ii. Were sites only considered diverged if they were fixed for different bases in DOM and MUS? (If not, what was the criteria?)

      iii. Using, say, DOM as the focal species, a site must be polymorphic in DOM. But did its status (polymorphic/fixed) in MUS matter?

      We have added a more detailed description on this in the Methods section. For the direct answers of the three questions: (i) yes; (ii) yes; (iii) no, considering that DOM and MUS are two subspecies of Mus musculus separating recently, a variant might occur before separating and there might be gene flow between them.

      (b) A particularly interesting part of the analysis is the investigation of alpha for genes that are NOT sex-biased in one taxa but are sex-biased in the other. At the moment (as I understand it), alpha is only calculated for these genes in the taxa where they are NOT sex-biased (and this alpha value can be compared to the alpha of sex-biased genes and of unbiased genes in that taxa). I would like to see both sets of genes (set 1: those sex-biased in MUS and not in DOM; set 2: those sex-biased DOM and not in MUS) analyzed in each of the 2 species, with results presented in a 2x2 table.

      By definition of these categories, these genes are sex-biased in the respective other taxon, hence the values are already in the table. They are named as “reciprocal”.

      (c) No confidence intervals are given for the alpha values, despite the legend of Figure 3 referring to them.

      These were accidentally omitted - we now included the full table in suppl. Table S6; Figure 3 was modified to show violin plots of the bootstrap distributions

      The author's creation and use of a "sex-bias index" (SBI). My greatest skepticism of this manuscript is with respect to the value of their manufactured index, SBI. Of course, it is possible to create such an index but does this literature really need this index or does this just add to the "clutter" in the literature for this field? Is it helping to illuminate important patterns? This index is presumably some attempt to quantify how "male-like" or "female-like" overall expression is for a given individual (for a given organ). It is calculated as SBI = (MEDIAN of all female-biased tpm) - (MEDIAN of all male-biased tpm).

      (6) A main result that comes from this is that the sexes tend to overlap for these values for most nongonad tissues but are clearly distinct for gonadal tissues. I do not think this result would come as a surprise to almost anyone and I'm far from convinced that this metric is a good way to quantify that point. Let's consider testes vs. ovaries. Compared to non-gonadal tissues, I am reasonably certain that not only are there many more genes that are classified as "sex-biased" in gonads but also the magnitude of sex-bias among these genes is typically much greater than it is for the so-called sex-biased genes in nongonadal tissue (density plots requested in #3a would make this clear). In other words, males and females are, on average, very different with respect to expression in gonads so even allowing for variation within each sex will still result in a clear separation of all individuals of the two sexes. In contrast, males and females are, on average, much less different in, say, heart so when we consider the variation within each sex, there is overlap. One could imagine a variety of different metrics which could be used to make this point. The merits of "SBI" are unclear. It is a novel metric and its properties are poorly understood. (A simple alternative would be looking at individual scores along the axis separating mean/median males and females; almost certainly, for gonads, this would be very similar to PC scores for PC1.)

      As throughout the text, we use gonadal comparisons only as general reference, not as the main result. The main result that we are stressing is the fast turnover of these patterns, including from binary to overlapping for kidney and liver in mouse. We consider this as a new finding. If it comes "not to a surprise to anyone", isn´t it great that one does not have to guess anymore but has finally real data on this?

      We have now also added a mosaic analysis to show that the SBI can be used as summary measure in different presentations.

      The use of a single PC axis is no good alternative, since it throws away the information from the other axis.

      We have now included an explicit discussion on the usefulness of the SBI.

      (7) For simplicity, let's assume all males are identical and all females are identical. Let's imagine that heart and kidney have the exact same set of sex-biased genes. There are 20 female-biased genes; they all happen to be identical in expression level (within tissue) and look like this:

      Female TPM Male TPM TPM ratio (F:M)

      Heart 4 2 2

      Kidney 40 20 2

      And there are 20 male-biased genes that look like this:

      Female TPM Male TPM TPM ratio (F:M)

      Heart 1 3 1/3

      Kidney 10 30 1/3

      Most people would describe these two tissues as equally sex-biased.

      However, the SBIs would be:

      Female SBI Male SBI Sex difference (F - M)

      Heart 4-1 = 3 2 - 3 = -1 4

      Kidney 40-10 =30 20-30 = -10 40

      Is it a desirable property that by this metric these two tissues have wildly different SBI values for each sex as well as for the difference between sexes? (At the very least, shouldn't you make readers aware of these strange properties of SBI so they can decide how much value they put into them?)

      Actually, in this example the simple ratio between the expression levels has a strange property, since it does not reflect a much higher expression of the relevant genes in the kidney. The SBI is actually more suitable for making such cases clear. Of course, this is under the assumption that expression level has a meaning for the phenotype, but this is the general assumption for all RNA-Seq experiment comparisons.

      (8) With respect to Figure 4, why do females often have mean SBI values close to zero or even negative (e.g., kidney, mammary glands)? Is this simply because the female-biased genes tend to have lower TPM than the male-biased genes? It seems that the value zero for this metric is really not very biologically meaningful because this metric is a difference of two things that are not necessarily expected to be equal.

      This is the extra information about the expression levels that is gained via the SBI values (see comment above). However, we noticed that people can get confused about this. We have now added a re-scaling step to focus completely on the variance information in these plots.

      (9) Interpreting variances. A substantial fraction of the latter half of the manuscript focuses on interpreting variances among individual samples. This is problematic because there is no replication within individuals (i.e.., "repeatability"), thus it is impossible to infer the extent of observed variance among individuals of a given group (e.g., among females) is due to true biological differences among individuals or is simply due to noise (i.e., "measurement error" in the broad sense). Is the larger variance for mammary glands than liver or gonads just due to measurement error? What is the evidence?

      This point was of course a major issue during the times where microarrays were used for transcriptome studies. However, the first systematic RNA-Seq studies showed already that the technical replicability is so high, that technical replicates are not required. In fact, practically all RNA-Seq studies are done without technical replicates for this reason.

      (10) Because I have little confidence in the SBI metric (#7-8) and in interpreting within sex variances (#9), I found little value in the human results and how SBI distributions (and degree of overlap between sexes) compare between humans and mice.

      We disagree - the current published status is that there are thousands of sex-biased gene in humans and this has implications for gender-specific medicine (Oliva et al. 2020). Our results show a much more nuanced picture in this respect.

      (11) I found even less value in the single-cell data. It too suffers from the issues above. Further, as the authors more or less state, the data are too limited to say much of value here. It is impossible to tell to what extent the results are simply due to data limitations.

      We have pointed out that it is still valuable to have them. They are good enough to exclude the possibility that only a small set of cells drives the overall pattern across an organ. We have further clarified this in the text.

      (12) The code for data analysis should be posted on GitHub or some other repository.

      The code for the sex-biased gene detection and analysis has been posted on GitHub (see Code availability in the manuscript).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      Weaknesses:

      As this paper only uses anatomical analyses, no functional interpretations of cell function are tested.

      The aim of this paper was to describe the ultrastructural organization of compound eyes in the extremely small wasp Megaphragma viggianii. The authors successfully achieved this aim and provided an incredibly detailed description of all cell types with respect to their location, volume, and dimensions. As this is the first of its kind, the results cannot easily be compared with previous work. The findings are likely to be an important reference for future work that uses similar techniques to reconstruct the eyes of other insect species. The FIB-SEM method used is being used increasingly often in structural studies of insect sensory organs and brains and this work demonstrates the utility of this method.

      We thank you for your high assessment of our work. Unfortunately, it is hard to test our functional interpretations and check them with electrophysiological methods due to the extremely small size of the animal. Studies on three-dimensional ultrastructural datasets obtained using vEM have just started to appear, and we hope that a lot of data will become available for comparison in the nearest future.

      Reviewer #2:

      Thank you for your work and for your high assessment of our manuscript.

      Reviewer #3:

      Weaknesses:

      The claim that the large dorsal part of the eye is the dorsal rim area (DRA), supported by anatomical data on rhabdomere geometry and connectomics in authors' earlier work, would eventually greatly benefit from additional evidence, obtained by immunocytochemical staining, that could also reveal a putative substrate for colour vision. The cell nuclei that are located in the optical path in the DRA crystalline cone have only a putative optical function, which may be either similar to pore canals in hymenopteran DRA cornea (scattering) or to photoreceptor nuclei in camera-type eyes (focussing), both explanations being mutually exclusive.

      We thank the Reviewer for high assessment of our study and for detailed analysis of our manuscript. Your comments and recommendations are very valued and helped us to improve the text. We understand that immunocytochemical methods could improve our findings and supply additional evidence, but there is no technical possibility for this in present. Megaphragma is a very complicated model organism for such methods. We are currently working on the optimization of the protocol for staining, which is needed because of the high level of autoluminescence and because of insufficient penetration of dyes into the samples.

      Recommendations for the authors:

      Reviewer #1:

      I do not have any major concerns about the content of the paper.

      There are some minor spelling and grammatical errors throughout the text but these can be identified most readily using a spelling/grammar check.

      We have revised the text, checked the spelling, and fixed the grammatical errors throughout the text.

      I suggest consistency when referring to the capitalization of the term 'non-DRA' as it is sometimes 'Non-DRA' in the text.

      We have fixed the term “non-DRA” throughout the text. Thank you.

      Also, check carefully the spelling of headings in the tables as there are a few mistakes in Table 1 and 5 in particular.

      The grammar errors have been fixed.

      Figure 7 legend: an explanation of the abbreviation RPC should be added.

      We have done so.

      Reviewer #2:

      (1) The paper presents the data in great detail, however, since this is the first time the technique has been applied to get whole insect eyes, even if on a small insect, it would be worth outlining in the methods section what innovations in the staining/ scanning or sample preparation allowed these improvements and a roadmap for extending this method to larger insects if possible.

      The whole method, including sample preparation, staining, and scanning, was described in our previous paper (Polilov et al., 2021), where it was presented in every detail. Due to the complicated methodology we suppose that it is not necessary to include all the stages of the technique in the present paper, and thus described it more briefly.

      (2) The optical modelling needs a statement in the discussion providing a disclaimer on parameters like sensitivity, anatomical measurements can provide limits and some measure, but the inherent optics are also key and it is worth qualifying these as only estimates and measurements that give a sense of the variation in morphology, only coupled with optical and potentially neural measurements could one confirm the true sensitivity and acceptance angle.

      In the absence of experimental data or precise computational models of Megaphragma vision, we try to discuss rather carefully the functions of structures based on their morphology, ultrastructure, first-order visual connectome, and analogies with other species. This is reflected in the methods and those sections of our paper that contain functional interpretations.

      Reviewer #3

      (1) The finding that the CNS neurons are enucleated, while the compound eye contains cell nuclei, deserves another word. I would confidentially say that the optical demands of a miniaturized compound eye (the minimal size of the optics due to diffraction, the rhabdomere size, and the minimal thickness of optically insulating granules) are such that further cellular miniaturization is not possible, and the minimal sizes even render the cells that build the eye sufficiently large to accommodate cell nuclei. This is in my opinion a parsimonious explanation, yet speculative and I leave it up to you to embrace it or not.

      We agree with the Reviewer and understand the limiting factors and the optical demands of a miniaturized compound eye. According to our data, nuclei occupy a considerable volume in the eye (in the cells of compound eye there are more nuclei than in the whole brain), and on average the cell volume is larger than in Trichogramma, which is minute, but larger than Megaphragma. But as the Reviewer rightly assumed, it is speculative; therefore, we would like to avoid it.

      (2) Our current understanding of DRA optics and function is limited and I claim that your interpretation of the cell nuclei in the DRA dioptrical apparatuses is inappropriate. Please consider a few articles on hymenopteran DRA, starting with the one below and the citing literature:

      Meyer, E.P., Labhart, T. Pore canals in the cornea of a functionally specialized area of the honey bee's compound eye. Cell Tissue Res. 216, 491-501 (1981). https://doi.org/10.1007/BF00238646

      Honebyee DRA has a milky appearance under a stereomicroscope and can be discerned from the outside. This is due to pore canals in the cornea. I happen to be studying this exact structure and its function right now. I found that the result of those canals is not so much the extended receptor acceptance angles, but rather a minimized light gain. This is counterintuitive, but think of the following. The DRA photoreceptors must encode the limited range of polarization contrasts with a maximal working dynamic range (= voltage) of the photoreceptors, which results in a very steep stimulus-response curve.

      Physiologically such a curve is due to very high transduction gain and a high cell input resistance. In most of the retina, small contrasts are transcoded by LMC neurons, but DRA receptors are long visual fibres and must do the job themselves. The skylight intensity (especially antisolar, where the polarized pattern is maximal) varies little during the day. Hence, the DRA receptors work almost at a fixed intensity range. In order to prevent receptor saturation and keep steep contrast coding, the corneal lenses in DRA have a built-in diffusor ring, which diminishes the light influx. Unfortunately, I have yet to publish this and I may be wrong, of course. But if I look into your data, I see consistently smaller corneal lenses and crystalline cones in the DRA, plus the cell nuclei obstructing the incident light. I think this is similar to the optics of honeybee DRA.

      You do not support your claim that the nuclei additionally focus light by optical calculations, but cite literature on camera-type eyes, which is not OK.

      In any case, I think it is fair to limit the discussion by saying that the nuclei may have an optical role. Further evidence from hymenopteran and vertebrate literature is controversial. “so that the nuclei act as extra collecting lenses, as was reported for rod cells of nocturnal vertebrates (Solovei et al., 2009; Błaszczak et al., 2014)” - please consider omitting this.

      We thank the Reviewer for this piece of advice. And we have rewritten the text, to omit the comparison with vertebrates, but left the citation as an illustration of the fact that nuclei could perform the optical role.

      “Since the nuclei in DRA and non-DRA ommatidia are arranged differently in cone cells, we suggest that the nuclei of the cone cells of DRA ommatidia in M. viggianii perform some optical role, facilitating the specialization of this group of ommatidia. The optical function for nuclei was described for rod cells of nocturnal vertebrates, where chromatin inside the cell nucleus has a direct effect on light propagation (Solovei et al., 2009; Błaszczak et al., 2014; Feodorova et al., 2020).”

      (3) Please consider comparing the structure and function of ectopic receptors with the eyelet in Drosophila (i.e. https://doi.org/10.1523/JNEUROSCI.22-21-09255.2002 )

      We thank the Reviewer for this advice and have included the comparison fragment into the text:

      “The position of ePR, their morphology and synaptic targets look similar to the eyelet (extraretinal photoreceptor cluster) discovered in Drosophila (Helfrich-Förster et al., 2002). Eyelets are remnants of the larval photoreceptors, Bolwig’s organs in Drosophila (Hofbauer, Buchner, 1989). Unlike Drosophila, Trichogrammatidae are egg parasitoids and their central nervous system differentiation is shifted to the late larva and even early pupa (Makarova et al., 2022). According to the available data on the embryonic development of Trichogrammatidae, no photoreceptors cells were found during the larval stages (Ivanova-Kazas, 1954, 1961).”

      According to this, the analogy question remains open.

      (4) Minor remarks:

      “but also to trace the pathways that connect the analyzer with the brain.” - I find the word analyzer a bit stretched here; sure, the DRA is polarization analyzer, but if the main retina was monochromatic, it would only be a detector, not an analyzer.

      The sentence was changed according to the Reviewer’s advice.

      Table I: thikness -> thickness, wigth -> width

      We have fixed these misprints.

      “The cross-section of Non-DRA ommatidia has a strongly spherical shape” - perhaps circular, not spherical. And not necessary to say “strongly”

      The spelling was changed according to the Reviewer’s advice.

      “which can be rarely visualized in the cell's projections not far from the basement membrane.” - I'd suggest saying “which are nearly absent in retinula axons”

      The spelling was changed according to the Reviewer’s advice.

      “The pigment granules of the retinula cells have an elongated nearly oval shape” - please consider replacing 'elongated nearly oval' with 'prolate' (try googling for “prolate” or “oblate spheroids”; the adjective describes precisely what you wanted to say)

      We thank the Reviewer for this piece of advice but prefer to leave our original phrasing, because it is more readily understandable.

      “The results of our morphological analysis of all ommatidia in Megaphragma are consistent with the light-polarization related features in Hymenoptera and other insects” - please add citations, see my comment on the DRA above.

      We have added the citations according to the Reviewer’s advice.

      “The group of short PRs (R1-R6)” - please consider renaming into “short visual fibre photoreceptors” (as opposed to “long visual fibre PRs”; hence SVFs and LVFs). This naming is quite common.

      The naming was changed according to the Reviewer’s advice.

      “The total rhabdom shortening in M. viggianii ommatidia probably favors polarization and absolute sensitivity,” - please see comments on DRA. Wide rhabdom means also a wider acceptance angle.

      Shortening of DRA rhabdoms does not result in their widening compared to other rhabdoms, so it is difficult to say how this may be related to sensitivity. The comments on DRA given earlier have been taken into account.

      “Ommatidia located across the diagonal area of the eye are more sensitive to light” - I don't understand what is diagonal area.

      We have deleted the sentence.

      “Estimated optical sensitivity of the eyes very close to those reported for diurnal hymenopterans with apposition eyes (Greiner et al., 2004; Gutiérrez et al., 2024) and possess around 0.19 {plus minus} 0.04 μm2 sr. M. viggianii have reasonably huge values of acceptance angle Δρ, and thus should result in a low spatial resolution” - please correct English here. “eyes IS very close”, “should result in a low”

      The grammatical errors were fixed.

      Table 6 legend: “SPC - secondary pigment cells.” -> “SPC – secondary pigment cells.”

      Citation “(Makarova et al., 2025).” - probably 2015

      The typos were fixed.

      Methods, FIB-SEM: I can't understand the sentence “The volumetric data of lenses and cones, some linear measurements (lens thickness, cone length, cone width, curvature radius) and to visualize the complete 3D-model of eye we use (measure or reconstruct) the elements from another eye (left).”

      The sentence is a continuation of the previous one. We have rewritten it as follows to clarify the meaning and move it to the 3D reconstruction section:

      “The right eye, on which the reconstruction was performed, has several damaged regions from milling (see Appendix 1С), which hinder the complete reconstructions of lenses and cones on a few ommatidia. According to this, for the volumetric data on lenses and cones, some linear measurements (lens thickness, cone length, cone width, curvature radius), we use (measure or reconstruct) the corresponding elements from the other (left) eye.”

      “The cells of single interfacet bristles were not reconstructed, because of damaging on right eye and worst quality of section on the left.” - please change to “The cells of the single interfacet bristle were not reconstructed, because of damage to the right eye and inferior quality of the sections of the left eye.”

      The text has been changed as follows:

      “The cells of single interfacet bristles were not reconstructed, because of the damage present in the right eye and because of the generally lower quality of this region on the left eye.”

      “Morphometry. Each ommatidia was” -> “Morphometry. Each ommatidium was”

      The grammatical error has been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #3 (Recommendations for the authors):

      Major concerns:

      P.6, lines 223-224: The sentence sounds like the authors produced all the OVGP1s by themselves in their laboratories, which is not completely true. The recombinant human and mouse OVGP1s were purchased from OriGene. It is suggested that the authors should state and explain clearly here which OVGP1 is produced by their laboratories and that recombinant human and mouse OVGP1s were obtained and purchased from Origene.

      It is already clearly included in the M&M.

      P6, lines 227-229: The authors stated that "Western blots of the three OVGP1recombinants indicated expected sizes based on those of the proteins: 75 kDa for human and murine OVGP1 and around 60 kDa for bovine OVGP1 (Fig. 4B and S6)." I pointed out in my last review report that the size of the recombinant human OVGP1shown by the authors in their manuscript is not in agreement with what has been published previously in literature regarding the molecular weight of native human OVGP1 as well as that of recombinant human OVGP1. The authors did not address the above concern adequately. In fact, recombinant human OVGP1 has been produced a few years ago (Reproduction (2016) 152:561-573) and it has been previously demonstrated that a single protein band of approximately 110-130 kDa was detected for both native human OVGP1 (see Microscopy Research and Technique (1995) 32:57-69) and recombinant human OVGP1 (Reproduction (2016) 152:561-573; Carbohydrate Research (2012) 358:47-55) using antibodies specific for human OVGP1. Molecular weight of the protein core or polypeptide of human OVGP1 is approximately 75 kDa, but the glycosylated form of native human OVGP1 and recombinant human OVGP1 is approximately 110-130 kDa. Therefore, the authors might have been using the recombinant core protein of human OVGP1 instead of the fully glycosylated recombinant OVGP1 in their study. The same concern also applies to the commercially obtained mouse recombinant OVGP1 used by the authors in their study. I would also like to mention that the mature and fully glycosylated OVGP1s in mammals vary in molecular weight (90-95 kDa in domestic animals; 110-150 kDa in primates; 160-350 kDa in rodents). Again, the 75kDa of mouse OVGP1 detected by the authors could be the core protein or polypeptide of mouse OVGP1 instead of the fully glycosylated mouse OVGP1.

      In our study, as previously mentioned, we included commercially available recombinant proteins from Origene for human and murine OVGP1, which are produced in mammalian cells, and we also produced and purified bovine OVGP1 in mammalian cells. Therefore, these proteins should be properly glycosylated. Moreover, we performed Western blot assays favouring the blotting of higher molecular weight proteins, ensuring the optimal conditions for the assay. Additionally, we tested the size of OVGP1 from murine and bovine oviductal fluids on the same blot. During oestrus, the size of OVGP1 from oviductal fluids matches that of the recombinant proteins, and this band is downregulated during anoestrus, confirming the proper size of recombinant protein.

      P.7, lines 236 and 237: Please provide a figure or source to support the statement "...as confirmed by proteomics of the bands along with PEAKS Studio v11.5 search engine peptide identification software."

      It is included in the text the amount of unique peptides obtained by Proteomics for OVGP1 identification over all protein groups identified.

      P.7, lines 243 to 245: The statement "...using rabbit polyclonal antibody to human OVGP1 for bOVGP1 and endogenous OVGP1, and mouse monoclonal antibody against Flag (DDK)-tag for hOVGP1 and mOVGP1." is confusing and might be inaccurate. First of all, I wondered why the authors did not use an antibody against bovine OVGP1 for the recombinant bOVGP1 instead of using a rabbit polyclonal antibody to human OVGP1. Secondly, what does the "endogenous OVGP1" refer to in the statement? Thirdly, the authors in their study used the commercially available recombinant human OVGP1 and recombinant mouse OVGP1 purchased from Origene. Based on the data sheet provided by Origene, the tag used for both recombinant human OVGP1 and recombinant mouse is C-Myc/DDK-tag and not Flag-tag. Can the authors explain these discrepancies?

      Firstly, for the recombinant protein of bOVGP1 we used the same antibody that we used in the Western blot for all the proteins and oviductal fluids because we do not have anti-His tag working for Immunofluorescence (the one we had only worked for Western blot) and neither we do not have any antibody against bovine OVGP1. In the case of human and murine since we had anti-Flag antibody that worked for Western blot and for immunofluorescence, we used this one. However, as has been shown in our figure and supplementary material, the antibody against human OVGP1 works properly for both techniques (Western blot and Immunofluorescence). Secondly, endogenous OVGP1 is referred to the OVGP1 present in the oviductal fluid. Thirdly, as you can see in the datasheet of the protein, the recombinant proteins purchased from Origene contains a c-myc tag (EQKLISEEDL) some amino acids and a ddk-tag (DYKDDDDK). The sequence of ddk is the same of Flag-tag (DYKDDDDK). Since the proteins have both tags we used the antibody against Flag (or ddk) epitope.

      P12, lines 429-432: The newly added statement at the end of the Discussion saying "Additionally, future studies would be valuable to investigate whether incubating oocytes with oviductal fluid (or OVGP1) could reduce polyspermy in porcine IVF and whether ZPs could be leveraged to naturally enhance sperm selection in human ICSI" is very concerning and requires further attention. The statement reflects that the authors do not keep pace with and do not pay attention to what has been published in literature regarding porcine and human OVGP1s. In fact, porcine oviduct-specific glycoprotein (OVGP1) has already been reported to reduce the incidence of polyspermy in pig oocytes (Biology of Reproduction (2000) 63:242-250). Porcine oviductal fluid, used in porcine IVF, has also been found to exert a beneficial effect on oocytes by reducing the incidence of polyspermy without decreasing the penetration rate. (Theriogenology (2016) 86:495-502). Therefore, the studies deemed valuable by the authors to be investigated in the future have, in fact, already been carried out two decades ago by several other laboratories. I am surprised the authors were not aware of these published work in literature. All the above should have been incorporated in the Discussion.

      This sentence is modified in the discussion and the references are included.

      Furthermore, as mentioned earlier, recombinant human OVGP1 has also been produced (Reproduction (2016) 152:561-573), and recombinant human OVGP1 has been found to increase tyrosine phosphorylation of sperm proteins, a biochemical hallmark of sperm capacitation, and potentiate the subsequent acrosome reaction (Reproduction (2016) 152:561-573) as well as increase sperm-zona binding (Journal of Assisted Reproduction and Genetics (2019) 36:1363-1377). These earlier findings should be incorporated into the Discussion.

      Thank you for your comment, but in this work we had not performed any experimental setting related to tyrosine phosphorylation and despite is a very interesting topic is not directly related to this work.

      P.19, lines 678-683: Since the human and mouse recombinant oviductin proteins were purchased from Origene, the authors should be aware of the fact that these commercially available recombinant OVGP1s might not be fully glycosylated. While I appreciate the fact that the authors wanted to briefly describe how the human and mouse recombinant OVGP1s were prepared by the manufacturer, I strongly suggest that the authors should contact Origene, the manufacturer, for all information regarding the procedures for producing the human and mouse recombinant oviductin proteins. For example, the authors stated on lines 680-681 that "A sequence expressing FLAG-tagged epitope proteins (DYKDDDDK) was cloned into an expression vector." According to the data sheet provided by Origene, it appears that both human and recombinant oviductin proteins are C-Myc/DDK-tagged and not FLAG-tagged.

      Thank you for your comment, as according to the sequence of Flag-tag it is matching with the sequence of the tag in the datasheet corresponding to DDK (this is in detail in previous comment). Besides, the protein is tagged also by C-Myc tag. Among both tags, the antibody selected to detect it was anti-Flag tag.

      P.19, lines 692-697: The description of the primary and secondary antibodies used for detection of the various recombinant OVGP1s is also very confusing and not clearly presented. For example, it is mentioned here that "...membranes were...incubated with anti-OVGP1 rabbit monoclonal antibody for OVGP1,..". What specifically does "OVGP1" refer to here? The authors then stated that anti-Histamine Tag antibody was used to detect bOVGP1 and mOVGP1 and anti-Flag antibody was used to detect hOVGP1. As pointed out earlier, the human and mouse recombinant OVGP1s were produced using C-Myc/DDK tag and not His-tag or Flag-tag. Can the authors clarify these discrepancies?

      We apologise for the complexity of the antibodies, we included in this paragraph the ones used to Western blot for both figures: anti- human OVGP1 was used for the principal figure that contains the three recombinant proteins and oviductal fluids; and the anti-Histidine and anti-Flag antibodies that are included in supplementary figure, specifically for recombinant bovine OVGP1 (Histidine tag) and for recombinant murine and human OVGP (DDK tag). A clarifying sentence has been included in the text.

      P.31, lines 1143-1149: Figure 10 is not mentioned anywhere in the main text of the manuscript. Rewrite the second half of the sentence "...; being this specificity lost when OVGP1 is heterologous to the ZP (right diagram)." Which sounds awkward and grammatically not correct.

      The figure is already mentioned in the text, thank you for your comment. The sentence is also corrected.

      Other comments: P.1, the statement of "All authors contributed equally to this work" on line 14 can be deleted because detailed and specific contributions from each authors are listed in lines 1009-1017 on page 27.

      Both authors contributed equally to this work, now is clear in authors contribution section.

      P.2, lines 43 and 44: Do the authors mean "sperm-oocyte binding protein" instead of "sperm-oocyte fusion protein" in the sentence? "Fusion protein" is a protein composed of two or more domains encoded by different genes, or a hybrid molecule created by combining two different proteins for various purposes. I believe the term "fusion protein" is wrongly used in the sentence which should be rephrased with a proper term.

      Done.

      P2, line 73: Remove the comma after the word "Both".

      Done.

      P.5, line 179: "...mice ZP..." should be written as "...mouse ZP...".  

      Done.

      P.6, heading of 3rd paragraph on line 207: The term "binding" will be a better term than "fusion" used in the heading because the results do not actually show the fusion of the OVGP1 proteins with the ZP glycoprotein. Instead, binding of the OVGP1 proteins to the ZP occurred.

      Done.

      P.6, lines 215-217: Authors, please provide a reference or references to support the statement "Region A, corresponding to the amino acid end, shows high identity among monotremes, marsupials and placentals."

      In the text was indicated a review (29) which includes the supporting idea of this statement for Figure 4. Moreover, we have included some if the references used for the description of the domains when performing the sequence alignment of Figure S5.

      P.6, line 230 and line 233 on P.7: Authors, please be consistent in the use of either American English or British English. The word "oestrus" is British English whereas "estrus" is American English.

      Done.

      P.7, line 264: The word "sticking" used here means non-specific binding. I believe the author means specific binding here. If so, a more appropriate word should be used here instead of "sticking".

      Done.

      P.7, lines 267-269: This newly added sentence sounds very awkward and should be completely rewritten.

      Done.

      P.8, line 288: This reviewer finds it difficult to understand the meaning of the heading. The heading should be rephrased to bring out exactly what the authors want to say in well-written English.

      Done.

      P.8, line 290: The word "would" should be replaced by "could" in the sentence.

      Done.

      P.13, line 437: Authors, please provide the location of Sigma-Aldrich.

      Done.

      P.13, line 457: Here, the authors used "1800 rpm" to indicate the centrifugation speed but used the g-force elsewhere in the Materials and Methods. Please be consistent. The g-force is preferred.

      Done.

      P.14, lines 483-485: The procedure of sacrificing the cats should be provided in the Materials and Methods

      Cats weren’t sacrificed they were vasectomized. It is now included in the text.

      P.17, line 628: "...the ZPs were exposed or no exposed to..." should be written as "...the ZPs were either exposed or not exposed to...".

      Done.

      P.17, line 629: "...each groups were incubated with..." should be "...each group was incubated with...".

      Done.

      P.19, line 700: "As loading control, was used the primary antibody....." is not a complete sentence and it needs to be rewritten.

      Done.

      P.20, lines 744-754: For scanning electron microscopy and image processing, the procedures of prior treatment of the oocytes with and without oviductal fluid and OVGP1 should be included here.

      Done.

      P.21, line 756: It is stated here that "Two hundred isolated ZPs were treated with Clostridium perfringens neuraminidase....". However, it is not clear whether two hundred isolated ZPs of both porcine and murine ZPs were treated. Authors, please clarify.

      We used 200 isolated ZPs of each specie, bovine and murine. It is classified in the text.

      P.28, lines 1039 and 1040: The author only mentioned the use of bovine and murine sperm here. What about human sperm?

      Done.

      P.29, line 1076: "...in mammalian cells..." is very vague. Be specific what exactly the mammalian cells were.

      Done.

      P.29, line 1079: "Oviductal fluid from ovulated cows or anoestrus cows." is not a complete sentence and it needs to be rewritten.

      Done.

    1. Author response:

      Conflation of control, difficulty and reward rate

      In response to the comment of control being conflated with task difficulty (and thus reward rate) that the reviewer feels is not adequately discussed in the paper, we will add more to this point in our discussion, especially in relation to previous literature. It is important to note, however, that our measure of perceived difficulty was included in analyses assessing the fluctuations in stress and control. Subjective control still had a unique effect on the experience of stress over and above perceived difficulty, suggesting that subjective control explains variance in stress beyond what is accounted for by perceived difficulty. We will also include additional analyses in which we include the win rate (i.e. percentage of all trials won) as a covariate when assessing the relationship between subjective control, perceived difficulty and subjective stress, which shows that win rate does not predict stress, but subjective control and perceived difficulty still uniquely predict subjective stress. The results of this will be added and elaborated further in the discussion.

      Neutral video condition

      In response to the comment of the neutral video condition not being active enough, we believe that any task with action-outcome contingencies would have a degree of controllability. To better distinguish experiences of control (WS task) to an experience of no/neutral control (i.e., neither high nor low controllability), we decided to use a task in which no actions were required during the task itself, although concentration was still required (attention checks regarding the content of the videos and ratings of the videos).

      The suggestion of having a high arousal video condition would indeed be interesting to test how experiencing ‘neutral’ control and high(er) stress levels preceding the stressor task influences stress buffering and stress relief. This is a good suggestion for future work that we can include in the discussion section.

      The TSST version (online and anticipatory)

      We will add more information regarding prior literature that the Trier Social Anticipatory Stress test has found physiological and psychological correlates (e.g. Nasso et al., 2019, Schlatter et al., 2021, Steinbeis et al., 2015), suggesting that the anticipation is still a valid stress manipulation despite participants not performing the actual speech task. Further, the TSST had a significant impact on subjective stress in the expected direction demonstrating that it was effective at eliciting subjective stress.

      Internal consistency

      We will parcellate the timepoints differently (not just odd/even sliders) to test the internal consistency, for example a random split or first half/second half.

      Effect of win-loss domain in Study 2

      We will run additional analyses testing the interaction of Domain (win or loss) with stressor intensity when predicting the stress buffering and stress relief effects. To test whether the loss domain is more valuable at mitigating experiences of stress than the win condition, we will run additional analyses with just the high control conditions (WS task) to test for a Domain*Time interaction, as we cannot test a Control*Domain*Time interaction in the full model given that we do not have ‘Domain’ for the video (neutral control) condition.

      Stress relief analyses

      Regarding the stress relief analyses (timepoints 2 and 3) and ‘baseline’ stress (timepoint 1), we will add to the manuscript that there is no significant difference in stress ratings between the high control and neutral control (collapsed across stress and domain) after the WS/video task, hence why we do not think it’s necessary to include in the stress relief model. Nevertheless, we will include a sensitivity analysis in the supplementary material to test the Timepoint*Control interaction (of stress relief – timepoints 2 and 3) when including timepoint 1 stress as a covariate.

      Clarity

      We will add more clarity in the methods section regarding within- and between-subject manipulations. We will also add Figure S4 to the main manuscript and expand Figure 1 to include both Studies 1 and 2 and a timeline of when subjective stress was assessed throughout the experiment.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Busch and Hansel present a morphological and histological comparison between mouse and human Purkinje cells (PCs) in the cerebellum. The study reveals species- specific differences that have not previously been reported despite numerous observations of these species. While mouse PCs show morphological heterogeneity and occasional multi-innervation by climbing fibers (CFs), human PCs exhibit a widespread, multi-dendritic structure that exceeds expectations based on allometric scaling. Specifically, human PCs are significantly larger, and exhibit increased spine density, with a unique cluster-like morphology not found in mice.

      Strengths:

      The manuscript provides an exceptionally detailed analysis of PC morphology across species, surpassing any prior publication. Major strengths include a systematic and thorough methodology, rigorous data analysis, and clear presentation of results. This work is likely to become the go-to resource for quantitation in this field. The authors have largely achieved their aims, with the results effectively supporting their conclusions.

      We are grateful to this reviewer for their thoughtful assessment that this work will be a go-to resource for the field.

      Weaknesses:

      There are a few concerns that need to be addressed, specifically related to details of the methodology as well as data interpretation based on the limits of some experimental approaches. Overall, these weaknesses are minor.

      We thank this reviewer for their careful reading of the manuscript and for highlighting limitations and weaknesses in the methodology. We are in full agreement that while interpretation is somewhat limited, there is still value in their description. As detailed below in response to this reviewer’s recommendations, we provide more description of our imaging resolution. This additional detail clarifies that our quantitation is appropriate for the scale of the objects being measured and provides critical information to help readers assess the findings as they may pertain to their own work.

      Reviewer #2 (Public review):

      Summary:

      This manuscript aims to follow up on a previously published paper (Busch and Hansel 2023) which proposed that the morphological variation of dendritic bifurcation in Purkinje cells in mice and humans is indicative of the number of climbing fiber inputs, with dendritic bifurcation at the level of the soma resulting in a proportion of these neurons being multi-innervated. The functional and anatomical climbing fiber data was obtained solely from mice since all human tissue was embalmed and fixed, and the extension of these findings to human Purkinje cells was indirect. The current comparative anatomy study aims to resolve this question in human tissue more directly and to further analyse in detail the properties of adult human Purkinje cell dendritic morphology.

      Strengths:

      The authors have carried out a meticulous anatomical quantification of human Purkinje cell dendrites, in tissue preparations with a better signal-to-noise ratio than their previous study, comparing them with those from mice. Importantly, they now present immunolabelling results that trace climbing fiber axons innervating human PCs. As well as providing detailed analyses of spine properties and interesting new findings of human PC dendritic length and spine types, the work confirms that human PCs that have two clearly distinct dendritic branches have an approximately x% chance of receiving more than one CF input, segregated across the two branches. Albeit entirely observational, the data will be of widespread interest to the cerebellar field, in particular, those building computational models of Purkinje cells.

      We thank this reviewer for their positive and considered assessment of our work. We enthusiastically agree that while these data are descriptive in nature, they may be of interest across modalities of cerebellar research and will provide a more detailed framework for cross-species comparisons and single cell computational modeling, which remains a critical tool to explore the human case given the inaccessibility of physiological experimentation.

      Weaknesses:

      The work is, by necessity, purely anatomical. It remains to be seen whether there are any functional differences in ion channel expression or functional mapping of granule inputs to human PCs compared with the mouse that might mitigate the major differences in electronic properties suggested.

      We are in full agreement with the reviewer that the focused anatomical description of this manuscript could not make strong assertions about function given that cellular and circuit physiology is determined by many additional factors that remain unexamined. We appreciate that the reviewer acknowledges that this is out of necessity as those factors are inaccessible to experimentation at the current time; however, we are enthusiastic that our current findings will motivate future work that will shed light on these critical additional features of the system, both in rodents and humans.

      Reviewer 1 (Recommendations for the authors):

      PCs are now known to be genetically diverse, with unique PC types found only in humans. Could this cellular diversity contribute to the differences observed between species in this study? This possibility should be at least discussed in the context of the findings.

      We agree that this is a fascinating possibility. The perhaps most detailed recent study (Sepp et al., Nature 625, 2024) – in a conservative assessment – describes four developmental PC subtypes in mice that are identical in humans. The study points out that the subtype ratio changes over the course of development, though. Taken together with the possibility of additional human-specific subtypes, a genetic basis for morphological as well as physiological diversity arises. This is now discussed on p. 7. It needs to be kept in mind, however, that other factors, such as push-pull influences during tissue growth, might also play a role.

      The human tissue used in this study was obtained from elderly individuals, while the mouse tissue was not. It is unclear whether the age difference might influence the findings, and this warrants further discussion or control.

      We share this concern, in particular regarding the spine / spine cluster analysis as here tissue quality and or degenerative effects might play a role. We additionally analyzed a tissue sample from a 37 year-old human, and observed the same spine clusters as in the other human brains. This is now described on p. 4 of the revised manuscript.

      The study includes spine size comparisons, but it is not clear if the point spread function (PSF) of the microscope provides the necessary resolution for these quantitative assessments. For instance, are multi-headed spines truly multi-headed, or could this be an artifact of limited resolution?

      This is an important point. We addressed it by calculating the Rayleigh limit (more conservative than the Abbe limit) as 248.4nm for the equipment and conditions used (Methods, p. 22). On pages 3-5, we updated our Results section accordingly to point out what quantifications are well supported and discuss the limitations (p. 3-5).

      Reviewer 2 (Recommendations for the authors):

      This is nice work which must have been very time-consuming. It would be good to make sure that the technical details are properly discussed, to quantify the data properly. Please include details of how you measured the resolution of the microscope used to evaluate spine size.

      See our response to the last comment of Referee 1 above.

      The figure panels are mostly satisfactory, but they are exceptionally crowded and will probably be difficult to read at the final size. Some work tidying these would be worth it. In Figure 3B, include mention of open and blue triangles in legend. In 3E, the dendritic branches are shown at a different gray scale. You have not done this elsewhere, so probably good to mention it in the legend.

      Figure 3 and its legend have been updated / improved accordingly.

      The definition of horizontal and vertical is not absolutely clear. Perhaps re-assess this bit of the text. Does it mean that you did not include cells that were neither vertical nor horizontal?

      We categorized those PCs as ‘vertical’ that have a >30° angle relative to the PC layer, and those as ‘horizontal’ that have a <30° angle relative to the PC layer. All PCs are covered by these categories. This is now described on p. 5.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #2 suggested the addition of new data to address the following points:

      Reviewer #2: 

      (1) Oncogenic GOF - the main data shown for GOF are the survival curve and enhanced metastasis. Often, GOF is exemplified at the cellular level as enhanced migration and invasion, which are standard assays to support the GOF. As such, the authors should perform these assays using either tumor cells derived from the mice or transformed fibroblasts from these mice. This will provide important and confirmatory evidence for GOF for Y217C. 

      We thank the referee for this comment. Our previous data indicated accelerated tumor progression and increased metastasis in Trp53<sup>Y217C/Y217C</sup> mice, which provided in vivo evidence of an oncogenic gain of function (GOF) for the p53<sup>Y217C</sup> mutant. However, we agree that it was important to provide additional evidence of GOF at the cellular level. 

      Many cellular assays were previously used to evaluate the GOF of p53 mutants, including those listed by the referee. Importantly, Zhao et al. recently showed that a common property of several p53 mutants proposed to have oncogenic GOF is their capacity to promote chromosomal instability (Zhao et al. (2024) Nat. Commun. 15, 180). For the revision of our manuscript, we compared the frequencies of chromosomal alterations occurring spontaneously in WT, Trp53<sup>Y217C/Y217C</sup> and Trp53<sup>-/-</sup> mouse embryonic fibroblasts (MEFs). Chromosome breaks, radial chromosomes and DMs were more frequent in Trp53<sup>Y217C/Y217C</sup> MEFs than in WT or Trp53<sup>-/-</sup> MEFs, providing clear evidence of a GOF promoting chromosomal instability. This new result is presented in Figure 2G and mentioned in the revised abstract. 

      Furthermore, as pointed out by referee #1 in a confidential comment, increased NF-kB signaling provides evidence of p53 GOF. Accordingly, Zhao et al. proposed that the capacity of p53<sup>G245D</sup> and p53<sup>R273H</sup> to promote chromosomal instability ultimately led to activation of a noncanonical NF-kB signaling that would promote tumor cell invasion and metastasis. Consistent with their work, we now report that the GSEA of Trp53<sup>Y217C/Y217C</sup> and Trp53<sup>-/-</sup> thymocytes revealed an upregulation of non-canonical NF-kB signaling in Trp53<sup>Y217C/Y217C</sup> thymic cells (a new result presented in Figure 5F and Supplementary Figure S13).  These new data lead us to mention in the revised discussion that “similar mechanisms might underlie the oncogenic properties of the p53<sup>Y217C</sup>, p53<sup>G245D</sup> and p53<sup>R273H</sup> mutants”.

      (2) Novel target gene activation - while a set of novel targets appears to be increased in the Y217C cells compared to the p53 null cells, it is unclear how they are induced. The authors should examine if mutant p53 can bind to their promoters through CHIP assays, and, if these targets are specific to Y217C and not the other hot-spot mutations. This will strengthen the validity of the Y217C's ability to promote GOF. 

      We respectfully disagree with the referee when he/she considers that the validity of p53<sup>Y217C</sup>’s ability to promote a GOF would be strengthened by showing that p53<sup>Y217C</sup> binds to the promoters of genes upregulated in Trp53<sup>Y217C/Y217C</sup> cells. In fact, Pal et al. recently performed the experiment proposed by the referee, by integrating RNAseq and ChIPseq data from MCF10A cells expressing p53<sup>Y220C</sup>, the human equivalent of p53<sup>Y217C</sup>,  and found that 95% of the genes upregulated upon p53<sup>Y220C</sup> expression were upregulated indirectly, without p53<sup>Y220C</sup> binding to their promoters (Pal et al. (2023) NPJ Breast Cancer 9, 78). Consistent with our data, Pal et al. notably found that the expression of p53<sup>Y220C</sup> increased cell migration and invasion, which correlated with an increased expression of S100A8 and S100A9. They found that the promoters of S100A8 and S100A9 were however not bound by p53<sup>Y220C</sup>, indicating an indirect mechanism for their upregulated expression. Furthermore, the study by Zhao et al. mentioned above also suggested an indirect mechanism of GOF, because the upregulation of inflammation-related genes by a mutant p53 protein was proposed to result from signaling cascades triggered by chromosomal instability. Our data appear consistent with both studies, because p53<sup>Y217C</sup> was undetectable or barely detectable in the chromatin fraction of Trp53<sup>Y217C/Y217C</sup> cells, and because Trp53<sup>Y217C/Y217C</sup> cells exhibited increased chromosome instability and increased NFB signaling compared to Trp53<sup>-/-</sup> cells, which may suggest indirect mechanisms for p53<sup>Y217C</sup> GOF. 

      Nevertheless, we agree with the referee that it was important to provide stronger evidence of p53<sup>Y217C</sup> GOF in the revised manuscript.  In that regard, we were intrigued by the perinatal death of most Trp53<sup>Y217C/Y217C</sup> females, which provided evidence of unexpected teratogenic effects of the mutant. We had proposed that these female-specific teratogenic effects likely resulted from pro-inflammatory GOF of p53<sup>Y217C</sup>. This hypothesis relied on the RNAseq pro-inflammatory signature in Trp53<sup>Y217C/Y217C</sup> thymic cells, and on the fact that the glycoprotein CD44, known to drive inflammation, had been identified as a key gene in open neural tube defects. However, we had not tested this hypothesis experimentally. In the revised version of the manuscript, we tested this hypothesis. We mated Trp53<sup>+/Y217C</sup> female mice with Trp53<sup>Y217C/Y217C</sup> males, then administered supformin (LCC-12), a potent CD44 inhibitor known to attenuate inflammation in vivo, to pregnant mice by oral gavage. The administration of subformin led to a five-fold increase in the proportion of weaned Trp53<sup>Y217C/Y217C</sup> females in the progeny, suggesting that reducing inflammation in utero rescued some of the Trp53<sup>Y217C/Y217C</sup> female embryos. This new result is presented in Figure 5G and Supplementary Table S6, and mentioned in the abstract. 

      We believe that these new results, as well as the additional GSEA analyses revealing increased NFkB signaling in Trp53<sup>Y217C/Y217C</sup> cells, further emphasize the importance of inflammation in the GOF of the p53<sup>Y217C</sup> mutant. Accordingly, we slightly modified the title of our article, to include the notion that Trp53<sup>Y217C</sup> is an inflammation-prone mouse model. We also end the article by summarizing the effects of p53<sup>Y217C</sup> in vivo, in a new Supplementary Table S7 that compares the LOF effects of a p53 KO with the (LOF+GOF) effects of the p53<sup>Y217C</sup> mutant. 

      (3) Dominant negative effect - the authors' claim of lack of DN effect needs to be strengthened further, as most p53 hot-spot mutations do exhibit DN effect. At the minimum, the authors should perform additional treatment with nutlin and gamma irradiation (or cytotoxic/damaging agents) and examine a set of canonical p53 target genes by qRT-PCR to strengthen their claim. 

      Our previous data indicated identical tumor onset and survival in Trp53<sup>+/Y217C</sup> and Trp53<sup>+/-</sup> mice, leading us to conclude that, at least for spontaneous tumorigenesis, there was no evidence of a Dominant Negative Effect (DNE) in vivo. Here, we followed the referee’s suggestion and evaluated the possibility of a DNE in response to stress, by comparing WT, Trp53<sup>+/Y217C</sup> and Trp53<sup>+/-</sup> MEFs or thymocytes. We analyzed different types of stress (Nutlin, Doxorubicin, girradiation) and different types of cellular responses (transactivation of classical p53 target genes, cell cycle arrest, apoptosis), and the results lead us to conclude that there is little if any DNE also in response to various stresses. These new data are mentioned in a paragraph evaluating the possibility of DNE or GOF at the cellular level, and presented in a new Supplementary Figure S6.

    1. Author response:

      We thank the reviewers of this manuscript for their thoughtful and detailed feedback, and agree that they bring up valid points. We also thank them for their suggestions on how to improve this study. We intend to revise this manuscript to help address these concerns and in the future will submit a revised version that will hopefully be improved in terms of the clarity of the text and rigor of the experimental findings.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      In this valuable study, García-Vázquez et al. provide solid evidence suggesting that G2 and S phases expressed protein 1 (GTSE1), is a previously unappreciated non-pocket substrate of cyclin D1-CDK4/6 kinases. To this end, this study holds a promise to significantly contribute to an improved understanding of the mechanisms underpinning cell cycle progression. Notwithstanding these clear strengths of the article, it was thought that the study may benefit from establishing the precise role of cyclin D1-CDK4/6 kinase-dependent GTSE1 phosphorylation in the context of cell cycle progression, …

      We do not claim, as editors and reviewers appear to have interpreted, that GTSE1 is phosphorylated by cyclin D1-CDK4 in the G1 phase of the cell cycle under normal physiologic conditions.  Indeed, we agree with the existing literature indicating that in cells that do not express high levels of cyclin D1, GTSE1 is expressed predominantly during S and G2 phase (hence the name GTSE1, which stands for G-Two and S phases expressed protein 1) and is phosphorylated by mitotic cyclins in early mitosis.  Even during G1, when the levels of cyclin D1 peak, GTSE1 is not phosphorylated in normal cells.  This could be due to either a higher affinity between GTSE1 and mitotic cyclins as compared to D-type cyclins or to a higher concentration of mitotic cyclins compared to D-type cyclins.  In the current manuscript, we show that higher levels of cyclin D1 can drive the sustained phosphorylation of GTSE1 across all cell cycle points. To reach this conclusion, we do not rely only on the overexpression of exogenous cyclin D1. In fact, we observe similar effect when we deplete endogenous AMBRA1, resulting in the stabilization of endogenous cyclin D1 in all cell cycle phases (see Figure 2G and Figure supplement 3B).  As we had already mentioned in the Discussion section, we propose that GTSE1 is phosphorylated by CDK4 and CDK6 particularly in pathological states, such as cancers displaying overexpression of D-type cyclins (i.e., it is possible that the overexpression overcomes the lower affinity of the cyclin D-GTSE1 complex). In turn, phosphorylation of GTSE1 induces its stabilization, leading to increased levels that, as expected based on the existing literature, contribute to enhanced cell proliferation.  So, the role of the cyclin D1-CDK4/6 kinase-dependent GTSE1 phosphorylation is to stabilize GTSE1 independently of the cell cycle.  In sum, our study suggests that overexpression of cyclin D1, which is often observed in cancers cells beyond the G1 phase, induces phosphorylation of GTSE1 at all points in the cell cycle. 

      … obtaining more direct evidence that cyclin D1-CDK4/6 kinase phosphorylate indicated sites on GTSE1 (e.g., S454) …

      We show that treatment of cells with palbociclib completely abolished the effect of cyclin D1-CDK4 on the GTSE1 shift observed using Phos-tag gels (Figure 2H).  Moreover, mutagenesis analysis shows that S91, S262, and S724 are phosphorylated in a cyclin D1-CDK4-dependent manner (Figure 2F and Figure supplement 3A). Compared to wild-type GTSE1, a triple mutant (S91A/S262A/S724A) displayed loss of slower-migrating bands upon co-expression of cyclin D1-CDK4, suggesting diminished phosphorylation. Nevertheless, a residual slow-migrating band persisted, prompting further mutations of the triple GTSE1 mutant in S331 and S454 (individually), which do not have a CDK-phosphorylation consensus, but were identified in several published phospho-proteomics studies. From these two quadruple mutants, only the that containing the S454A mutation demonstrated a complete abrogation of any shift in phos-tagTM gels (Figure 2F). These studies suggest that four major sites (S91, S262, S454, and S724) are phosphorylated (either directly and/or indirectly) in a cyclin D1-CDK4-dependent manner.

      … and mapping a degron in GTSE1 whose function may be blocked by cyclin D1-CDK4/6 kinase-dependent phosphorylation.

      We show that stabilization or overexpression of cyclin D1, which is often observed in human cancers, promotes GTSE1 phosphorylation on S91, S262, S454, and S724, resulting in GTSE1 stabilization.  Similarly, a phospho-mimicking mutant with the 4 serine residues replaced with an aspartate at positions 91, 261, 454, and 724 display increased half-life. While we appreciate the editor’s suggestion and agree on these being interesting questions, we would like to respectfully point out that mapping the GTSE1 degron and understanding how it is affected by cyclin D1-CDK4/6-dependent phosphorylation is outside the scope of the current project and will require an extensive set of experiments and tools. Accordingly, the three reviewers did not ask to map the GTSE1 degron.  We plan on addressing these interesting questions as part of a follow-up study.

      Reviewer #1 (public review):

      Summary:

      García-Vázquez et al. identify GTSE1 as a novel target of the cyclin D1-CDK4/6 kinases. The authors show that GTSE1 is phosphorylated at four distinct serine residues and that this phosphorylation stabilizes GTSE1 protein levels to promote proliferation.

      Strengths:

      The authors support their findings with several previously published results, including databases. In addition, the authors perform a wide range of experiments to support their findings.

      Weaknesses:

      I feel that important controls and considerations in the context of the cell cycle are missing. Cyclin D1 overexpression, Palbociclib treatment and apparently also AMBRA1 depletion can lead to major changes in cell cycle distribution, which could strongly influence many of the observed effects on the cell cycle protein GTSE1. It is therefore important that the authors assess such changes and normalize their results accordingly.

      We have approached the question of GTSE1 phosphorylation to account for potential cell cycle effects from multiple angles: 

      (i) We conducted in vitro experiments with purified, recombinant proteins and shown that GTSE1 is phosphorylated by cyclin D1-CDK4 in a cell-free system (Figure 2A-C). These experiments provide direct evidence of GTSE1 phosphorylation by cyclin D1-CDK4 without the influence of any other cell cycle effectors. 

      (ii) We present data using synchronized AMBRA1 KO cells (new Figure 2G and Figure supplement 3B).  In agreement with what we had shown previously (Simoneschi et al., Nature 2021, PMC8875297), AMBRA1 KO cells progress faster in the cell cycle but they are still synchronized as shown, for example, by the mitotic phosphorylation of Histone H3, peaking at 32 hours after serum readdition like in parental cells. Under these conditions we observed that while phosphorylation of GTSE1 in parental cells is evident in the last two time points, AMBRA1 KO cells exhibited sustained phosphorylation of GTSE1 across all cell cycle phases.  This was evident enough when using Phos-tag gels as in the top panel of the old Figure 2G. We now re-run one the biological triplicates of the synchronized cells using higher concentration of Zn<sup>+2</sup>-Phos-tag reagent and lower voltage to allow better separation of the phosphorylated bands.  Under these conditions, GTSE1 phosphorylation is better appreciable (top panel of the new Figure 2G). This experiment provides evidence that high levels of cyclin D1 in AMBRA1 KO cells affect GTSE1 phosphorylation independently of the specific points in the cell cycle. 

      (iii) The relative short half-life of GTSE1 (<4 hours) makes its levels sensitive to acute treatments such as Palbociclib or acute AMBRA1 depletion. The effects of these treatments on GTSE1 levels are measurable within a time frame too short to significantly affect cell cycle progression. For example, we used cells with fusion of endogenous AMBRA1 to a mini-Auxin Inducible Degron (mAID) at the N-terminus. This system allows for rapid and inducible degradation of AMBRA1 upon addition of auxin, thereby minimizing compensatory cellular rewiring. Again, we observed an increase in GTSE1 levels upon acute ablation of AMBRA1 (i.e., in 8 hours) (Figure 3B), when no significant effects on cell cycle distribution are observed (please see Simoneschi et al., Nature 2021, PMC8875297 and Rona et al., Mol. Cell 2024, PMC10997477).

      Altogether, the above lines of evidence support our conclusion that GTSE1 is a target of cyclin D1-CDK4, independent of cell cycle effects.

      In conclusion, we do not claim that GTSE1 is phosphorylated by cyclin D1-CDK4 in the G1 phase of the cell cycle under normal physiologic conditions.  Indeed, we agree with the existing literature indicating that in cells that do not express high levels of cyclin D1, GTSE1 is expressed predominantly during S and G2 phase (hence the name GTSE1, which stands for G-Two and S phases expressed protein 1) and is phosphorylated by mitotic cyclins in early mitosis.  Even during G1, when the levels of cyclin D1 peak, GTSE1 is not phosphorylated in normal cells. This could be due to either a higher affinity between GTSE1 and mitotic cyclins as compared to D-type cyclins or to a higher concentration of mitotic cyclins compared to D-type cyclins.  In the current manuscript, we show that higher levels of cyclin D1 can drive the sustained phosphorylation of GTSE1 across all cell cycle points. To reach this conclusion, we do not rely only on the overexpression of exogenous cyclin D1. In fact, we observe similar effect when we deplete endogenous AMBRA1, resulting in the stabilization of endogenous cyclin D1 in all cell cycle phases (see Figure 2G and Figure supplement 3B).  As we had already mentioned in the Discussion section of the original submission, we propose that GTSE1 is phosphorylated by CDK4 and CDK6 particularly in pathological states, such as cancers displaying overexpression of D-type cyclins (i.e., it is possible that the overexpression overcomes the lower affinity of the cyclin D1-GTSE1 complex). In turn, phosphorylation of GTSE1 induces its stabilization, leading to increased levels that, as expected based on the existing literature, contribute to enhanced cell proliferation.  In sum, our study suggests that overexpression of cyclin D1, which is often observed in cancers cells beyond the G1 phase, induces phosphorylation of GTSE1 at all points in the cell cycle.    

      Reviewer #2 (public review):

      Summary:

      The manuscript by García-Vázquez et al identifies the G2 and S phases expressed protein 1(GTSE1) as a substrate of the CycD-CDK4/6 complex. CycD-CDK4/6 is a key regulator of the G1/S cell cycle restriction point, which commits cells to enter a new cell cycle. This kinase is also an important therapeutic cancer target by approved drugs including Palbocyclib. Identification of substrates of CycD-CDK4/6 can therefore provide insights into cell cycle regulation and the mechanism of action of cancer therapeutics. A previous study identified GTSE1 as a target of CycB-Cdk1 but this appears to be the first study to address the phosphorylation of the protein by Cdk4/6.

      The authors identified GTSE1 by mining an existing proteomic dataset that is elevated in AMBRA1 knockout cells. The AMBRA1 complex normally targets D cyclins for degradation. From this list, they then identified proteins that contain a CDK4/6 consensus phosphorylation site and were responsive to treatment with Palbocyclib.

      The authors show CycD-CDK4/6 overexpression induces a shift in GTSE1 on phostag gels that can be reversed by Palbocyclib. In vitro kinase assays also showed phosphorylation by CDK4. The phosphorylation sites were then identified by mutagenizing the predicted sites and phostag got to see which eliminated the shift.

      The authors go on to show that phosphorylation of GTSE1 affects the steady state level of the protein. Moreover, they show that expression and phosphorylation of GTSE1 confer a growth advantage on tumor cells and correlate with poor prognosis in patients.

      Strengths:

      The biochemical and mutagenesis evidence presented convincingly show that the GTSE1 protein is indeed a target of the CycD-CDK4 kinase. The follow-up experiments begin to show that the phosphorylation state of the protein affects function and has an impact on patient outcomes.

      Weaknesses:

      It is not clear at which stage in the cell cycle GTSE1 is being phosphorylated and how this is affecting the cell cycle. Considering that the protein is also phosphorylated during mitosis by CycB-Cdk1, it is unclear which phosphorylation events may be regulating the protein.

      Please see point (ii) and the last paragraph in the response to Reviewer #1.  Moreover, we show that, compared to the amino acids phosphorylated by cyclin D1-CDK4, cyclin B1-CDK1 phosphorylates GTSE1 on either additional residues or different sites (Figure 2H). We also show that expression of a phospho-mimicking GTSE1 mutant leads to accelerated growth and an increase in the cell proliferative index (Figure 4B,C and new Figure supplement 4D-E).  Finally, we have evaluated also the cell cycle distributions by flow cytometry (new Figure supplement 4F). These analyses show that the expression of a phospho-mimicking GTSE1 mutant induces a decrease in the percentage of cells in G1 and an increase in the percentage of cells in S, similarly to what observed in AMBRA1 KO cells.

      Reviewer #3 (public review)

      Summary:

      This paper identifies GTSE1 as a potential substrate of cyclin D1-CDK4/6 and shows that GTSE1 correlates with cancer prognosis, probably through an effect on cell proliferation. The main problem is that the phosphorylation analysis relies on the over-expression of cyclin D1. It is unclear if the endogenous cyclin D1 is responsible for any phosphorylation of GTSE1 in vivo, and what, if anything, this moderate amount of GTSE1 phosphorylation does to drive proliferation.

      Strengths:

      There are few bonafide cyclin D1-Cdk4/6 substrates identified to be important in vivo so GTSE1 represents a potentially important finding for the field. Currently, the only cyclin D1 substrates involved in proliferation are the Rb family proteins.

      Weaknesses:

      The main weakness is that it is unclear if the endogenous cyclin D1 is responsible for phosphorylating GTSE1 in the G1 phase. For example, in Figure 2G there doesn't seem to be a higher band in the phos-tag gel in the early time points for the parental cells. This experiment could be redone with the addition of palbociclib to the parental to see if there is a reduction in GTSE1 phosphorylation and an increase in the amount in the G1 phase as predicted by the authors' model. The experiments involving palbociclib do not disentangle cell cycle effects. Adding Cdk4 inhibitors will progressively arrest more and more cells in the G1 phase and so there will be a reduction not just in Cdk4 activity but also in Cdk2 and Cdk1 activity. More experiments, like the serum starvation/release in Figure 2G, with synchronized populations of cells would be needed to disentangle the cell cycle effects of palbociclib treatment.   

      Please see last paragraph in the response to Reviewer #1.  Concerning the experiments involving palbociclib, we limited confounding effects on the cell cycle by treating cells with palbociclib for only 4-6 hours. Under these conditions, there is simply not enough time for S and G2 cells to arrest in G1.

      It is unclear if GTSE1 drives the G1/S transition. Presumably, this is part of the authors' model and should be tested.

      We are not claiming that GTSE1 drives the G1/S transition (please see last paragraph in the response to Reviewer #1). GTSE1 is known to promote cell proliferation, but how it performs this task is not well understood.  Our experiments indicate that, when overexpressed, cyclin D1 promotes GTSE1 phosphorylation and its consequent stabilization.  In agreement with the literature, we show that higher levels of GTSE1 promote cell proliferation.  To measure cell cycle distribution upon expressing various forms of GTSE1, we have now performed FACS analyses (new Figure supplement 4F). These analyses show that the expression of a phospho-mimicking GTSE1 mutant induces a decrease in the percentage of cells in G1 and an increase in the percentage of cells in S, similarly to what observed in AMBRA1 KO cells shown in the same panel and in Simoneschi et al. (Nature 2021, PMC8875297).

      The proliferation assays need to be more quantitative. Figure 4B should be plotted on a log scale so that the slope can be used to infer the proliferation rate of an exponentially increasing population of cells. Figure 4c should be done with more replicates and error analysis since the effects shown in the lower right-hand panel are modest.

      In Figure 4B, we plotted data in a linear scale as done in the past (Donato et al. Nature Cell Biol. 2017, PMC5376241) to better underline the changes in total cell number overtime.  The experiments in Figure 4B were performed in triplicate, statistical significance was determined using unpaired T-tests with p-values<0.05, and error bars represent the mean +/- SEM.  In Figure 4C, error analysis was not included for simplicity, given the complexity of the data.  We have now included the other two sets of experiments (new Figure supplement 4D,E).  While the effects shown in the lower right-hand panel of Figure 4C are modest, they demonstrate the same trend as those observed in the AMBRA KO cells (Figure 4C and Simoneschi et al., Nature 2021, PMC8875297). It's important to note that this effect is achieved through the stable expression of a single phospho-mimicking protein, whereas AMBRA KO cells exhibit changes in numerous cell cycle regulators. Moreover, these effects are obtained by growing cells in culture for only 5 days. A similar impact on cell growth in vivo over an extended period could pose significant risks in the long term.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 1E is referenced before 1D. The authors should consider switching D and E.

      Done.

      Figure 1D-E: The authors correctly note in the introduction that GTSE1 is encoded by a cell cycle-dependently expressed gene. Given that cell cycle genes are often associated with poor prognosis (e.g., see Whitfield et al., 2006 Nat. Rev. Cancer), this would be expected to correlate with poor prognosis. This should be mentioned in the results section.

      We agree that the overexpression of certain (but not all) cell cycle-regulated genes are prognostically unfavorable across various cancer types, and we cited Whitfield et al., 2006 Nat. Rev. Cancer.  However, our data indicate that phosphorylation of GTSE1 induces its stabilization and, consequently, its levels do not oscillate during the cell cycle any longer (new Figure 2G and Figure supplement 3B).  Moreover, analyzing data from the Clinical Proteomic Tumor Analysis Consortium, we observed an enrichment of GTSE1 phospho-peptides (normalized to total protein) within a pan-cancer cohort as opposed to adjacent, corresponding normal tissues (Figure 2I).

      Figure 2F: Contrast is too high. Blot images should not contain fully saturated black or white.

      We corrected the contrast.

      Figure 2G and Figure Supplement 3B: It looks like AMBRA1 KO cells do not synchronize properly in response to serum withdrawal. The cell cycle distribution should be checked by FACS. Otherwise, it is unclear whether changes in GTSE1 (phosphor) levels are only due to indirect changes in the cell cycle distribution.

      Synchronization of both parental and AMBRA1 KO cells is demonstrated by the fact that the phosphorylation of Histone H3 peaks at 32 hours after serum readdition in both cases (Figure supplement 3B). 

      Figure 2I: It is important that phosphor-GTSE1 levels are normalized to total GTSE1 levels to understand the distinct contribution of changes in GTSE1 levels and from CCND1-CDK4 driven phosphorylation.

      Done.

      Figure 3A-B: These experiments should also be controlled for cell cycle distribution. Is this effect specific to GTSE1 and other AMBRA1 targets or are other G2/M cell cycle proteins also affected?

      The relative short half-life of GTSE1 (<4 hours) makes its levels sensitive to acute treatments such as Palbociclib or acute AMBRA1 depletion. The effects of these treatments on GTSE1 levels are measurable within a time frame too short to significantly affect cell cycle progression. For example, we used cells with fusion of endogenous AMBRA1 to a mini-Auxin Inducible Degron (mAID) at the N-terminus. This system allows for rapid and inducible degradation of AMBRA1 upon addition of auxin, thereby minimizing compensatory cellular rewiring. Again, we observed an increase in GTSE1 levels upon acute ablation of AMBRA1 (i.e., in 8 hours) (Figure 3B), when no significant effects on cell cycle distribution are observed (please see Simoneschi et al., Nature 2021, PMC8875297 and Rona et al., Mol. Cell 2024, PMC10997477).

      Figure 4: It should be noted that the correlation with cell proliferation and cell cycle protein expression is expected for any cell cycle protein, including GTSE1.

      Actually, the main point of Figure 4 is to show that expression of the phospho-mimicking mutant of GTSE1 promotes cell proliferation. Comparative analysis revealed that cells overexpressing either wild-type GTSE1 or its phospho-deficient form exhibited significantly reduced proliferation rates compared to those expressing the phospho-mimicking mutant (Figure 4B,C). 

      The two-decades-old references 33 and 34 are not well suited to support the notion for Cyclin D1 that "the full spectrum of substrates and their impact on cellular function and oncogenesis remain poorly explored." More recent references should be used to show that this is still the case.

      We added more recent references.

      The authors conclude that their "data indicate that cyclin D1-CDK4 is responsible for the phosphorylation of GTSE1 on four residues (S91, S262, S454, and S724)." However, the authors' data do not exclude a role for their siblings cyclin D2, cyclin D3, and CDK6. Reflecting this, the conclusions should be toned down.

      The analysis of the sites phosphorylated in GTSE1 was performed by experimentally co-expressing cyclin D1-CDK4 (Figure 2F, Figure 2H, and Figure supplement 3A), hence our statement.  Yet, we agree that in cells, cyclin D2, cyclin D3, and CDK6 can contribute to GTSE1 phosphorylation. 

      The authors claim that they "observed that in human cells, when D-type cyclins are stabilized in the absence of AMBRA1, GTSE1 becomes phosphorylated also in G1." However, the G1-specific data presented by the authors are not controlled for, and it is unclear whether these phosphorylation events actually occur in G1 cells.

      We now provide a WB in which GTSE1 phosphorylation is more evident (top panel of the new Figure 2G) (please see point (ii) in the response to the public review of Reviewer #1).  This experiment clearly shows that in AMBRA1 KO cells, GTSE1 is phosphorylated at all points in the cell cycle. Synchronization of both parental and AMBRA1 KO cells is demonstrated by the fact that phosphorylation of Histone H3 peaks at 32 hours after serum re-addition in both cases (Figure supplement 3B). 

      Reviewer #2 (Recommendations for the authors):

      (1) It is not clear from the presented data at which point in the cell cycle that phosphorylation of GTSE1 may be affecting the steady state level of the protein. The implication that GTSE1 is a target of CycD-CDK4 would suggest that the protein is stabilized at G1/S. Can this effect be observed?

      Please see the last paragraph in the response to the public review of Reviewer #1.

      (2) Considering the previous study showing that GTSE1 is also phosphorylated during mitosis by CycB-Cdk1, do levels of GTSE1 protein change during the cell cycle? Do changes in GTSE1 levels correlate with phosphorylation during the cell cycle? Cell synchronization experiments such as double thymidine and subsequent phostag analysis could shed some light on these questions.

      Please see the last paragraph in the response to the public review of Reviewer #1.

      (3) The authors show that the phosphomimetic mutants of GTSE1 confer a growth advantage on cells. The mechanism of this growth advantage is unclear. Is this effect due to a shorter cell cycle, enhanced survival, or another mechanism?

      We did not observe increased cell survival when the phosphomimetic mutants of GTSE1 is expressed.  We show that phosphorylation of GTSE1 induces its stabilization, leading to increased levels that, as expected based on the existing literature, contribute to enhanced cell proliferation.  So, the role of the cyclin D1-CDK4/6 kinase-dependent phosphorylation of GTSE1 is to stabilize GTSE1. 

      (4) Other minor points - all of the presented immunoblots do not show molecular weight markers. The IF images require scale bars.

      To prevent overcrowding of the Figures, the sizes of blotted proteins are indicated in the uncropped scans of each blot. Uncropped scans have been deposited in Mendeley at:  https://data.mendeley.com/datasets/xzkw7hrwjr/1. Scale bars have been added to the IF images.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors have leveraged Single-cell RNA sequencing of the various stages of the evolution of lung adenocarcinoma to identify the population of macrophages that contribute to tumor progression. They show that S100a4+ alveolar macrophages, active in fatty acid metabolic activity, such as palmitic acid metabolism, seem to drive the atypical adenomatous hyperplasia (AAH) stage. These macrophages also seem to induce angiogenesis promoting tumor growth. Similar types of macrophage infiltration were demonstrated in the progression of the human lung adenocarcinomas.

      Strengths:

      Identification of the metabolic pathways that promote angiogenesis-dependent progression of lung adenocarcinomas from early atypical changes to aggressive invasive phenotype could lead to the development of strategies to abort tumor progression.

      We are grateful for your constructive comments. These comments are very helpful for revising and improving our paper and have provided important guiding significance to our study. We have made revisions according to your comments and have provided point-by-point responses to your concerns.

      Weaknesses:

      (1) Can the authors demonstrate what are the functional specialization of the S100a4+ alveolar macrophages that promote the progression of the AAH to the more aggressive phenotype? What are the factors produced by these unique macrophages that induce tumor progression and invasiveness?

      Thank you for your comments. To more comprehensively characterize the functional specialization of the S100a4<sup>+</sup> alveolar macrophages, we expanded the macrophage functional gene sets based on relevant literature and databases and performed enrichment analysis. The results showed that all stages of precancerous progression presented activated states of angiogenesis, M2-like and immunosuppressive functions relative to the normal stage (Figure 4B). As we have demonstrated, S100a4<sup>+</sup> alveolar macrophages predominantly exert pro-angiogenic functions during the AAH phase and may be more biased towards M2-like polarization and immunosuppression during further disease progression. Consistently, S100A4<sup>+</sup> subset population of macrophages has been proved to exhibit a M2-like phenotype with immunosuppressive properties in tumor progression [PMID: 34145030]. In addition, S100A4 has been reported to be associated with macrophage M2 polarization, angiogenesis, and tumorigenesis [PMID: 39664586, 36895491, 30221056, 32117590]. The functional status of human S100A4<sup>+</sup> alveolar macrophages is basically the same. The relevant description was added to the Results section as follows: “It was revealed that the capacities for angiogenesis, M2-like polarization, and immunosuppression were found to be stronger in AAH or other precancerous stages relative to the normal stage (Figure 4B). The pro-angiogenic function predominated in the AAH stage, while M2-like and immunosuppressive functions were more prominent in the subsequent precancerous progression.” (page 11, line 262). Our study puts more attention on the functional phenotypic changes of S100a4<sup>+</sup> alveolar macrophages during the progression from normal to AAH to explain the role of this subpopulation in tumor initiation, and similarly, preliminary coculture experiments could only indicate its role in the early malignant transformation of epithelial cells. In further experimental validation, we will confirm the above functions of the S100a4<sup>+</sup> alveolar macrophages promoting the progression of AAH to the more aggressive phenotype by in vitro and in vivo experiments. We have extended the limitations and potential experimental designs to the Discussion section as follows: “It is worth noting that our mining of S100a4<sup>+</sup> alv-macro remains at the precancerous initiation stage, and further experimental designs are needed to verify its specific contribution at more aggressive stages. For example, FACS sorting of the subpopulation at different stages of disease progression, respectively, for precise functional characterization;” (page 19, line 468).

      For the factors produced by these unique macrophages during induction of malignant transformation, we assayed culture supernatant of S100a4-OE alveolar macrophages for secreted functional cytokines. The results showed up-regulation of MIP-2, HGF, TNFα, IL-1a, CD27, CT-1, MMP9, 4-1BB, and CD40, and GO enrichment showed angiogenesis and tumorigenesis-related processes (Figure 5L and 5M). We have added the detailed content to the Results section as follows: “Next, we detected tumor-inducing factors secreted by these unique macrophages using Cytokine Antibody Array. We noted the production of macrophage inflammatory protein (MIP)-2, hepatocyte growth factor (HGF), tumor necrosis factor α (TNF-α), IL-1α, MMP9, and CD40, and these cytokine-related biological processes were mainly involved in the regulation of angiogenesis and immune response (Figure 5L and 5M).” (page 13, line 319). Furthermore, changes in these cytokines during subsequent invasive tumor progression will also be continuously monitored. The description in the Discussion section have been added as: “Furthermore, TGF-β and HGF activate vascular endothelial cells and promote proliferation and migration, as well as induce the expression of pro-angiogenic factors such as VEGF (Vimalraj, 2022; Watabe, Takahashi, Pietras, & Yoshimatsu, 2023). Macrophage-derived TNF-α and IL-1α lead tumor cells to produce potent angiogenic factors IL-8 and VEGF, which affect angiogenesis and tumor growth (Torisu et al., 2000). MIP2 and CD40 were also identified as pro-tumor factors associated with angiogenesis (Kollmar, Scheuer, Menger, & Schilling, 2006; Murugaiyan, Martin, & Saha, 2007)…continuous monitoring of the fluctuation of the above factors in bronchoalveolar lavage fluid at corresponding periods;” (page 19, line 461).

      All method details covered in this section have been updated in the Materials and methods.

      (2) Angiogenic factors are not only produced by the S100a4+ cells but also by pericytes and potentially by the tumor cells themselves. Then, how do these factors aberrantly trigger tumor angiogenesis that drives tumor growth?

      Thank you for your comment. In our study, we detected up-regulation of angiogenic factors HIF-1α, VEGF, MMP9, and TGF-β (Figure 5K), and elevation of secreted HGF, IL-1α, and TNF-α (Figure 5L). We provide a detailed description of how these factors are involved in angiogenesis-related tumorigenesis to varying degrees in the Discussion section: “Precancerous lesions of LUAD are angiogenic, and pro-angiogenic factors secreted by cells, including S100a4<sup>+</sup> alv-macro, induce endothelial cell sprouting and chemotaxis, leaving the angiogenic switch activated, prompting the formation of new blood vessels on the basis of the original ones to supply oxygen and nutrients to sustain tumor initiation (Chen et al., 2024; Kayser et al., 2003; van Hinsbergh & Koolwijk, 2008). Under hypoxic conditions, HIF-1α activates numerous factors that contribute to the angiogenic process, including VEGF, which promotes vascular permeability, and MMP9, which breaks down the ECM, promotes endothelial cell migration, and recruits pericytes to provide structural support (Raza, Franklin, & Dudek, 2010; Sakurai & Kudo, 2011). Cytokines secreted into the microenvironment activate macrophages, which subsequently produce angiogenic factors, further promoting angiogenesis (Sica, Schioppa, Mantovani, & Allavena, 2006). Furthermore, TGF-β and HGF activate vascular endothelial cells and promote proliferation and migration, as well as induce the expression of pro-angiogenic factors such as VEGF (Vimalraj, 2022; Watabe, Takahashi, Pietras, & Yoshimatsu, 2023). Macrophage-derived TNF-α and IL-1α lead tumor cells to produce potent angiogenic factors IL-8 and VEGF, which affect angiogenesis and tumor growth (Torisu et al., 2000)…” (page 19, line 449).

      (3) It is not clear how abnormal fatty acid uptake by the macrophages drives the progression of tumors.

      Thank you for your comment, which coincides with our mechanistic exploration. The metabolic status of macrophages influences their pro-tumor properties, and lipid metabolism has been shown to determine the functional polarization of macrophages [PMID: 29111350]. In this study, we observed more accumulation of lipid droplets in S100a4-OE MH-S, demonstrating enhanced cellular fatty acid uptake (Figure 6A). The pro-angiogenic ability of S100a4<sup>+</sup> alv-macro was confirmed by tube formation assay and cytokine assay (Figure 6B and 5M). Cpt1a was thought to play a crucial role in the metabolic paradigm shift of S100a4<sup>+</sup> alv-macro, we therefore performed functional rescue experiments by inhibiting CPT1A expression in S100a4-OE MH-S by addition of etomoxir (ETO). After culture with conditioned medium of MH-S, the proliferation, migration, and ROS production of MLE12 cells were all restored to lower levels (Figure 6E-G). In addition, ETO treatment significantly reversed the angiogenesis, which supported the regulation of fatty acid metabolism on macrophage function (Figure 6H). Immunoblotting also revealed restoration of expression in related proteins (Figure 6I and 6J), these findings reinforced previous analyses of the association of fatty acid metabolism with pro-angiogenesis and M2-like function in S100a4<sup>+</sup> alv-macro. The involvement of PPAR-γ in the regulation of metabolic state was also confirmed. Taken together, we suggest that S100a4<sup>+</sup> alv-macro promotes fatty acid metabolism through the CPT1A-PPAR-γ axis, enhances its ability to promote angiogenesis, and thus drives tumor occurrence. The corresponding contents were added in the Results section S100a4<sup>+</sup> alv-macro drove angiogenesis by promoting Cpt1a-mediated fatty acid metabolism (page 13, line 327) and Discussion section: “We demonstrated the regulation of fatty acid metabolism by CPT1A in S100a4<sup>+</sup> alv-macro as well as the involvement of PPAR-γ. Nevertheless, the molecular mechanism that drives the acquisition of metabolic and functional switching properties specific to this cell state still requires further characterization in the context of precancerous lesions. It has been reported that CD36 is the main effector of the S100A4/PPAR-γ pathway, and its mediated fatty acid uptake plays an important role in the tumor-promoting function of macrophages (S. Liu et al., 2021).” (page 18, line 433).

      All method details covered in this section have been supplemented in the Materials and methods.

      (4) Does infusion or introduction of S100a4+ polarized macrophages promote the progression of AAH to a more aggressive phenotype?

      Thank you for your comment. We performed intratracheal instillation of lentivirus-infected S100a4-OE MH-S and culture supernatant in A/J and BALB/c mice, respectively, but no aggressive pathological phenotype was observed so far, possibly due to the lack of time required for lesions or the imperfection of experimental conditions. We will continue to explore the instillation dose and frequency for long-term monitoring and will simultaneously evaluate the availability of primary alveolar macrophages. We have discussed as follows: “It is worth noting that our mining of S100a4<sup>+</sup> alv-macro remains at the precancerous initiation stage, and further experimental designs are needed to verify its specific contribution at more aggressive stages…and intratracheal instillation of primary S100a4<sup>+</sup> alv-macro to observe the pathological progression of precancerous lesions.” (page 19, line 468).

      (5) How does Anxa and Ramp1 induction in inflammatory cells induce angiogenesis and tumor progression?

      Thank you for your comment. ANXA2 is an important member of annexin family of proteins expressed on surface of endothelial cells, macrophages, and tumor cells [PMID: 30125343]. ANXA2 was reported to regulate neoangiogenesis in the tumor microenvironment and most likely due to overproduction of plasmin. As a well-established receptor for plasminogen (PLG) and tissue plasminogen activator (tPA) on the cell surface, ANXA2 converts PLG into plasmin. Plasmin plays a critical role in the activation of cascade of inactive proteolytic enzymes such as metalloproteases (pro-MMPs) and latent growth factors (VEGF and bFGF) [PMID: 12963694, 11487021]. Activated forms of MMPs and VEGF then induce extracellular matrix remodeling facilitating angiogenesis and tumor development [PMID: 15788416]. Sharma et al. suggested administration of ANXA2-antibody inhibited tumor angiogenesis and growth concurrent with plasmin generation [PMID: 22044461], the role of ANXA2 in plasmin activation thus explains it’s importance in tumor-related angiogenesis. We verified the simultaneous upregulation of ANXA2 and PLG in S100a4-OE MH-S and cocultured HUVEC and MLE12 by immunoblotting (Figure 6D). The relevant description was added to the Results section as follows: “ANXA2 is considered to be a cellular receptor for plasminogen (PLG), often expressed on the surface of endothelial cells, macrophages, and tumor cells, which activates a cascade of pro-angiogenic factors by promoting the conversion of PLG to plasmin, thereby promoting angiogenesis and tumor progression (Semov et al., 2005; Sharma, 2019). We found synergistic upregulation of ANXA2 and PLG expression in S100a4-OE MH-S and cocultured HUVEC and MLE12, which may help explain how ANXA2 induction was involved in angiogenesis and malignant transformation (Figure 6D).” (page 14, line 338).

      Recent studies showed that S100A4 is associated with tumor angiogenesis and progression by the interaction with ANXA2. ANXA2 is the endothelial receptor for S100A4 and that their interaction triggers the functional activity directly related to pathological properties of S100A4, including angiogenesis [PMID: 18608216]. It has been proved that S100A4 induces angiogenesis through interaction with ANXA2 and accelerated plasmin formation [PMID: 15788416, 25303710]. In addition, it is generally believed that ANXA2 participates in malignant cell transformation [PMID: 28867585]. Therefore, we speculate that ANXA2 may promote plasmin production by binding to S100A4, thus promoting angiogenesis and tumor initiation, and we have discussed accordingly: “The role of ANXA2 in angiogenesis has been widely recognized, and it may facilitate plasmin production by binding to S100A4 and then trigger angiogenesis and malignant cell transformation (Grindheim, Saraste, & Vedeler, 2017; Y. Liu, Myrvang, & Dekker, 2015).” (page 18, line 446).

      In our study, the primary target of our validation was ANXA2 rather than RAMP1, even though its relationship with angiogenesis had been established [PMID: 20596610], so we weakened the relevant description in the manuscript.

      (6) For the in vitro studies the authors might consider using primary tumor cells and not cell lines.

      Thank you for your suggestion, which was in our initial experimental plan. However, since S100A4 is not expressed on the cell surface, FACS sorting of primary subset of alveolar macrophages presents technical limitations. We have also attempted overexpression in primary macrophages, but the current overexpression efficiency and cell status are not sufficient to support a subsequent series of experiments. For all these reasons, the alveolar macrophage cell line MH-S and the lung epithelial cell line MLE12 were selected to ensure the consistency and stability of the coculture system.

      In addition, we are optimizing the experimental conditions to achieve coculture of primary macrophages and epithelial cells, and will also establish transgenic mouse models for simultaneous validation. The Discussion has been added as: “Besides, as our previous in vitro results were obtained based on cell lines, we will optimize the experimental conditions to achieve coculture of primary macrophage subset and epithelial cells and establish transgenic mouse models for in vivo validation.” (page 19, line 475).

      Reviewer #2 (Public review):

      Summary:

      The work aims to further understand the role of macrophages in lung precancer/lung cancer evolution

      Strengths:

      (1) The use of single-cell RNA seq to provide comprehensive characterisation.

      (2) Characterisation of cross-talk between macrophages and the lung precancerous cells.

      (3) Functional validation of the effects of S100a4+ cells on lung precancerous cells using in vitro assays.

      (4) Validation in human tissue samples of lung precancer / invasive lesions.

      We are grateful for your constructive comments. These comments are very helpful for revising and improving our paper and have provided important guiding significance to our study. We have made revisions according to your comments and have provided point-by-point responses to your concerns.

      Weaknesses:

      (1) The authors need to provide clarification of several points in the text.

      Thank you for your comment. We have clarified these points in the manuscript and responded to all your concerns in detail. Please see the responses to Recommendations for the authors.

      (2) The authors need to carefully assess their assumptions regarding the role of macrophages in angiogenesis in precancerous lesions.

      Thank you for your comment. We have cited relevant literature to support the occurrence of angiogenesis in precancerous lesions, and demonstrated the contribution of S100a4<sup>+</sup> alveolar macrophages by tube formation assay and cytokine assay. In addition, we have discussed the relevant limitations of this study and aimed to provide more robust evidence. Please see the responses to Recommendations for the authors.

      (3) The authors should discuss more broadly the current state of anti-macrophage therapies in the clinic.

      Thank you for your suggestion. We have provided extensive discussion of the clinical state of anti-macrophage therapies. Please see the responses to Recommendations for the authors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The text has grammatical and syntax errors that need to be corrected accordingly.

      Thank you for your suggestion. We have corrected the grammatical and syntactic errors and asked a native English speaker in the field to help polish the full text.

      Reviewer #2 (Recommendations for the authors):

      This work provides an important contribution to our further understanding of the role of macrophages in lung precancer/lung cancer evolution. I have several comments regarding how the manuscript could be improved:

      Introduction:

      The authors may consider citing the following work to enhance their work:

      (1) At line 78, where they talk about precancerous lesions being reversible, they should cite recent work on this in lung cancer: Teixeria et al 2019 PMID: 30664780, and Pennycuik et al 2020 PMID: 32690541.

      Thank you for your suggestion. We have cited the above references in the corresponding paragraph (page 4, line 76).

      (2) At line 96, where they talk about developing medicines for precancerous lesions, the authors should cite comprehensive review articles where this concept has been discussed in depth, for example: Reynolds et al 2023 PMID: 37067191, and Asad et al 2012 PMID: 23151603.

      Thank you for your suggestion. We have cited the above references in the corresponding paragraph (page 5, line 94).

      Results:

      (1) Line 142, the authors say "mice were feed for 12-16 months" - do they mean the mice were maintained for 12-16 months?

      Thank you for your comment. To best mimic the process of human lung cancer development, A/J mice with the highest incidence of spontaneous lung tumors, which increases substantially with age, were selected. The corresponding description has been modified as: “A/J mice have the highest incidence of spontaneous lung tumors among various mouse strains, and this probability significantly increased with age (Landau, Wang, Yang, Ding, & Yang, 1998). To more comprehensively mirror the tumor initiation and progression process of human lung cancer, A/J mice were maintained for 12-16 months for spontaneous lesions, which resulted in three recognizable precancerous lesions in the lung.” (page 7, line 138).

      (2) Line 143, the authors claim to have seen "three recognizable precancerous and cancerous lesions in the lung" but then, they only go on to describe AAH, adenoma, and AIS, lesions which are all commonly recognized as precancers. What was the cancerous (i.e. invasive) lesion they identified?

      Thank you for your comment. We apologize for this misstatement and will include cancerous lesions from mice for simultaneous analysis in subsequent study. The corresponding description has been revised as: “To more comprehensively mirror the tumor initiation and progression process of human lung cancer, A/J mice were maintained for 12-16 months for spontaneous lesions, which resulted in three recognizable precancerous lesions in the lung.” (page 7, line 140).

      (3) Line 172, the authors say that the "proportion of cell types across the four stages showed a dynamic trend" ... what does this mean? A trend towards what exactly?

      Thank you for your comment. Our intention was to highlight heterogeneous changes, and the description has been corrected: “The proportion of cell types across the four stages showed irregular changes, while transcriptional homogeneity was reduced with precancerous progression, illustrating the importance of heterogeneity in tumorigenesis and also proving the reliability of the sampling in this study.” (page 8, line 169).

      (4) Line 193, the authors say cell communication "showed a tendency to malignant transformation." What does this statement mean? If they mean more cell communication occurred in the malignant lesions than the precancerous, then there is a flaw in the logic because AAH, adenoma, and AIS are all precancerous lesions. What is the sequence of evolution to malignancy the authors are assuming? Do they mean AIS is a more advanced stage of precancerous malignancy than adenoma, and adenoma is more advanced than AAH (albeit they are all precancerous lesions).

      Thank you for your comments. The malignant transformation process involves multiple stages, and histological AAH is regarded as the beginning of this process. Precancerous lesions of LUAD in mice are believed to develop stepwise from AAH, adenoma, to AIS, even if the process is not necessarily completely consistent [PMID: 11235908, 32707077]. What we meant to describe was a gradual increase in the frequency of cell communication during this process. The corresponding description has been modified as: “At the evolutionary stages of precancerous LUAD, despite possible sample heterogeneity and other interference, we observed increased interactions between epithelial cells and surrounding stromal and immune cells in the microenvironment, indicating gradually frequent cell-cell communication during this process” (page 8, line 187).

      (5) Immunofluorescence images in Figure 3G and Figure 4F are captured at low magnification, making it very difficult to evaluate the colocalisation data. Suggest authors provide higher magnification images.

      Thank you for your suggestion. We have replaced the immunofluorescence images in Figure 3G and Figure 4F with higher magnification images.

      (6) Line 284 when referencing the cell line here, the author should make it clear in the text that cells were transfected with a construct expressing S100A4. If possible, would be good to understand if the level of S100A4 expression achieved is less, similar, or greater than that seen in these cells in vivo.

      Thank you for your suggestion. We have amended the text to make it clear: “S100a4-overexpressed (OE) alveolar macrophages were established by transfection of the mS100a4 vector into the murine MH-S cell line, and empty vector was transfected as negative control (NC) cells” (page 12, line 284), and it will be clarified in the following exploration whether the level of S100a4 expression achieved is less, similar, or greater than that seen in these cells in vivo.

      (7) Line 285 - when the authors first refer to OE cells that have been transfected, they should also inform the reader what NC cells are i.e. negative control cells?

      Thank you for your suggestion. We have revised the relevant content as follows: “S100a4-overexpressed (OE) alveolar macrophages were established by transfection of the mS100a4 vector into the murine MH-S cell line, and empty vector was transfected as negative control (NC) cells” (page 12, line 284).

      (8) Line 324 - the authors claim they have demonstrated that the macrophages promote angiogenesis through upregulation of fatty acid metabolism. Whilst they may have demonstrated changes in fatty acid metabolism, no experiments assessing the effect of the macrophages in angiogenesis assays are included in the paper, so the authors should modify this statement.

      Thank you for your comments. The relevant experiments have been added based on your suggestions. Firstly, we demonstrated in vitro the up-regulation of fatty acid metabolism in S100a4<sup>+</sup> alv-macro and uncovered the contribution of CPT1A to angiogenesis and cell transformation through rescue experiments; Then, HUVEC tube formation assay and cytokine assay confirmed the pro-angiogenic effect of S100a4<sup>+</sup> alv-macro. We have added the Results section S100a4<sup>+</sup> alv-macro drove angiogenesis by promoting Cpt1a-mediated fatty acid metabolism (page 13, line 327) and added the Discussion as: “We demonstrated the regulation of fatty acid metabolism by CPT1A in S100a4<sup>+</sup> alv-macro as well as the involvement of PPAR-γ. Nevertheless, the molecular mechanism that drives the acquisition of metabolic and functional switching properties specific to this cell state still requires further characterization in the context of precancerous lesions. It has been reported that CD36 is the main effector of the S100A4/PPAR-γ pathway, and its mediated fatty acid uptake plays an important role in the tumor-promoting function of macrophages (S. Liu et al., 2021).” (page 18, line 433).

      All method details covered in this section have been supplemented in the Materials and methods.

      (9) Regarding angiogenesis in precancerous lesions and the role of macrophages in this process: is there even any evidence that precancerous LUAD lesions are angiogenic? Don't these lesions typically have a lepidic pattern, wherein the cancer cells merely co-opt pre-existing alveolar capillaries without the need to generate new vessels?

      Thank you for your comments. As you mentioned, pathologically, precancerous LUAD lesions mainly show a lepidic growth pattern, characterized by the growth of type II alveolar epithelial cells along pre-existing alveolar walls [PMID: 29690599], but this does not mean that this process does not require the formation of new blood vessels. There are multiple patterns of tumor angiogenesis. Some studies have shown that increased angiogenesis can be observed in certain precancerous lesions, which suggests that angiogenesis may play an important role in the early stages of lung cancer development. Microvessel density (MVD) was increased in AAH and AIS compared to normal lung tissue, indicating that new blood vessels are forming to provide essential nutrients and oxygen to tumor cells to support their growth. The expression level of pro-angiogenic factors such as VEGF is usually upregulated, which promotes the formation of new blood vessels by stimulating endothelial cell proliferation and migration. [PMID: 39570802, 14568684] In addition, the infiltration of macrophages into precancerous areas in response to cytokines has been shown to trigger a tumor angiogenic switch and maintain tumor-associated continuous angiogenesis [PMID: 35022204]. Our in vitro tube formation assay and cytokine assay also demonstrated angiogenesis induced by S100a4<sup>+</sup> alv-macro. We have discussed the relevant content (page 19, line 449) and will provide more sufficient evidence in future work.

      Discussion:

      Perhaps the authors can cite any literature pertaining to the current wave of anti-macrophage therapies currently being tested in the clinic. Moreover, have these therapies been tested in lung cancer, and if so, what were the results?

      Thank you for your suggestion. At present, the clinical trials of anti-macrophage therapies mainly involve Gaucher's disease and hematological malignancies, and the two tests related to lung cancer have no valid data posted. Nevertheless, there are some preclinical studies worth learning from. We have cited the relevant literature and discussed in detail: “With the elaborate resolution of TME, macrophage-related therapy is considered to be promising. So far, macrophage-targeted therapy has demonstrated clinical efficacy in Gaucher's disease and advanced hematological malignancies (Barton et al., 1991; Ossenkoppele et al., 2013). In lung cancer, an attempt to enhance anti-PD-1 therapy in NSCLC by depleting myeloid-derived suppressor cells with gemcitabine was prematurely terminated because of insufficient data collected; another clinical trial of TQB2928 monoclonal antibody promoting macrophage phagocytosis of tumor cells in combination with a third-generation EGFR TKI for advanced NSCLC is now recruiting. Moreover, preclinical studies on macrophage-targeted therapy combined with immune checkpoint inhibitors are being extensively conducted in NSCLC, and it was suggested that blockade of purine metabolism can reverse macrophage immunosuppression, and a synergetic effect can be achieved when combined with anti-PD-L1 therapy, which inspired the direction of our early intervention strategies (H. Wang, Arulraj, Anbari, & Popel, 2024; Yang et al., 2025).” (page 20, line 479).

      Methods:

      Further description of how lesions were classified as precancerous (AAH, adenoma, AIS) or cancerous by the pathologist should be defined (or cite appropriate reference where this is described).

      Thank you for your suggestion. We have cited relevant references in the Methods section (page 21, line 528) on how lesions were classified by the pathologists [PMID: 21252716, 28951454, 32707077, 24811831].

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study combines predictions from MD simulations with sophisticated experimental approaches including native mass spectrometry (nMS), cryo-EM, and thermal protein stability assays to investigate the molecular determinants of cardiolipin (CDL) binding and binding-induced protein stability/function of an engineered model protein (ROCKET), as well as of the native E. coli intramembrane rhomboid protease, GlpG.

      Strengths:

      State-of-the-art approaches and sharply focused experimental investigation lend credence to the conclusions drawn. Stable CDL binding is accommodated by a largely degenerate protein fold that combines interactions from distant basic residues with greater intercalation of the lipid within the protein structure. Surprisingly, there appears to be no direct correlation between binding affinity/occupancy and protein stability.

      Weaknesses:

      (i) While aromatic residues (in particular Trp) appear to be clearly involved in the CDL interaction, there is no investigation of their roles and contributions relative to the positively charged residues (R and K) investigated here. How do aromatics contribute to CDL binding and protein stability, and are they differential in nature (W vs Y vs F)?

      Based on the simulations in Corey et al (Sci Adv 2021), aromatic residues, especially tryptophan, appear to help provide a binding platform for the glycerol moiety of CDL which is quite flat. This interaction is likely why we generally see the tryptophan slightly further into the plane of the membrane than the basic residues, where it may help to orient the lipid. Unlike charge interactions with lipid head groups, such subtle contributions are likely distorted by the transfer to the gas phase, making it difficult to confidently assign changes in stability or lipid occupancy to interactions with tryptophan. We have added an explanation of these considerations to the Discussion section (page 13, last paragraph).

      (ii) In the case of GlpG, a WR pair (W136-R137) present at the lipid-water on the periplasmic face (adjacent to helices 2/3) may function akin to the W12-R13 of ROCKET in specifically binding CDL. Investigation of this site might prove to be interesting if it indeed does.

      Thank you for the suggestion. In our CG simulations, we don’t see significant CDL binding at this site, likely because there is just a single basic residue. We note that there is a periplasmic site nearby with two basic residues (K132+K191+W125) with a higher occupancy, however still far lower than the identified cytoplasmic site. In general, periplasmic sites are less common and/or have lower affinity which may be related to leaflet asymmetry (Corey et al, Sci Adv 2021). We added the CDL density plot for the periplasmic side to Figure S7 and noted this on page 9, next-to-last paragraph.

      (iii) Examples of other native proteins that utilize combinatorial aromatic and electrostatic interactions to bind CDL would provide a broader perspective of the general applicability of these findings to the reader (for e.g. the adenine nucleotide translocase (ANT/AAC) of the mitochondria as well as the mechanoenzymatic GTPase Drp1 appear to bind CDL using the common "WRG' motif.)

      Several confirmed examples are presented in Corey et al (Sci Adv 2021), the dataset which we used to identify the CDL site in GlpG. So essentially, our broader perspective is that we test the common features observed in native proteins in an artificial system. While it is not clear how a peripheral membrane protein like Drp1 fits into this framework, the CDL binding sites in ANTs indeed have the same hallmarks as the one in GlpG (Hedger et al, Biochemistry 2016). We recently contributed to a study demonstrating that the tertiary structure of ANT Aac2 is stabilized by co-purified CDL molecules, underscoring the general validity of our findings (Senoo et al, EMBO J 2024).  We have added this information to the discussion, pg 12, third paragraph, and added a figure (S8, see below) to highlight the architecture of the Aac2-CDL complex.

      Overall, using both model and native protein systems, this study convincingly underscores the molecular and structural requirements for CDL binding and binding-induced membrane protein stability. This work provides much-needed insight into the poorly understood nature of protein-CDL interactions.

      We thank the reviewer for the positive assessment!

      Reviewer #2 (Public review):

      Summary:

      The work in this paper discusses the use of CG-MD simulations and nMS to describe cardiolipin binding sites in a synthetically designed, that can be extrapolated to a naturally occurring membrane protein. While the authors acknowledge their work illuminates the challenges in engineering lipid binding they are able to describe some features that highlight residues within GlpG that may be involved in lipid regulation of protease activity, although further study of this site is required to confirm it's role in protein activity.

      Comments

      Discrepancy between total CDL binding in CG simulations (Fig 1d) and nMS (Fig 2b,c) should be further discussed. Limitations in nMS methodology selecting for tightest bound lipids?

      We thank the reviewer for pointing out that this needs to be clarified. We analyze proteins in detergent, which is in itself delipidating, because detergent molecules compete with the lipids for binding to the protein, an effect that can be observed in MS (Bolla et al, Angew Chemie Int. Ed. 2020). Native MS of membrane proteins requires stripping of the surrounding lipid vesicle or detergent micelle in the vacuum region of the mass spectrometer, which is done through gentle thermal activation in the form of high-energy collisions with gas molecules. Detergent molecules and lipids not directly in contact with the protein generally dissociate easier than bound lipids (Laganowsky et al, Nature 2014), however, the even loosely bound lipids can readily dissociate with the detergent, artificially reducing occupancy. The nMS data is therefore likely biased towards lipids bound tightly (e.g. via electrostatic headgroup interactions), however, these are the lipids we are interested in, meaning that the use of MS is suitable here. We have noted this in the Discussion, last paragraph on page 12.

      Mutation of helical residues to alanine not only results in loss of lipid binding residues but may also impact overall helix flexibility, is this observed by the authors in CG-MD simulations? Change in helix overall RMSD throughout simulation? The figures shown in Fig.1H show what appear to be quite significant differences in APO protein arrangement between ROCKET and ROCKET AAXWA.

      For most of the study, we use CG with fixed backbone bead properties as well as an elastic network to maintain tertiary structure. This means that a mutation to alanine will have essentially no impact on the stability of the helix or protein in general in the CG simulations in the bilayer. It should be noted that Figure 1H shows snapshots from atomistic gas phase simulations with pulling force applied (see schematic in Figure 1F, as well as Figure S1 for ends-point structures), where we naturally expect large structural changes due to unfolding. We have analyzed the helix content in the gas-phase simulations and see that helix 1 in ROCKET unwinds within 10 ns but stays helical ca. 10 ns longer when bound to CDL. The AAWXA mutation stabilizes the helical conformation independently of CDL binding, but CDL tethers the folded helix closer to the core (see Figure 1 G and H). We have added this information to the results section and the plot below to Figure S2.

      CG-MD force experiments could be corroborated experimentally with magnetic tweezer unfolding assays as has been performed for the unfolding of artificial protein TMHC2. Alternatively this work could benefit to referencing Wang et al 2019 "On the Interpretation of Force-Induced Unfolding Studies of Membrane Proteins Using Fast Simulations" to support MD vs experimental values.

      We apologize for the confusion here. The force experiments are gas-phase all-atom MD. The simulations show that the protein-lipid complex has a more stable tertiary structure in the gas phase. Since these are gas-phase simulations, they cannot be corroborated using in-solution measurements. Similarly, the paper by Wang et al is a great reference for solution simulations, however, to date the only validations for gas-phase unfolding come from native MS.

      Did the authors investigate if ROCKET or ROCKETAAXWA copurifies with endogenous lipids? Membrane proteins with stabilising CDL often copurify in detergent and can be detected by MS without the addition of CDL to the detergent solution. Differences in retention of endogenous lipid may also indicate differences in stability between the proteins and is worth investigation.

      We have investigated the co-purification of the ROCKET variants and did not observe any co-purified lipids (see Figure S4) which we clarified in the results section (page 5, third paragraph) now. We previously showed that long residence times in CG-MD are linked to the observation of co-purified lipids, because they are not easily outcompeted by the detergent (Bolla et al, Angew Chemie Int. Ed. 2020). In CG-MD of ROCKET, we see that although the CDL sites are nearly constantly occupied, the CDL molecules are in rapid exchange with free CDL from the bulk membrane. For MS, all ROCKET proteins were extracted from the E. coli membrane fraction with DDM, which likely outcompetes CDL. This interpretation would explain why we see significant CDL retention when the protein is released from liposomes, but not when the protein is first extracted into detergent. For GlpG, CDL residence times in CG-MD  are longer, which agrees with CDL co-purification. Similarly, there is clearly an enrichment of CDL when the protein is extracted into nanodiscs (Sawczyc et al, Nature Commun 2024).

      Do the AAXWA and ROCKET have significantly similar intensities from nMS? The AAXWA appears to show slightly lower intensities than the ROCKET.

      We did not observe a significant difference, however, in most spectra, the AAXWA peaks have a lower intensity than those of the other variants (see e.g. Figure S5). While this could be batch-to-batch variations, there may be a small contribution from the lower number of basic residues (see Abramsson et al, JACS au 2021). However, there is an excess of basic residues in the soluble domain of ROCKET, so this interpretation is speculative.

      Can the authors extend their comments on why densities are observed only around site 2 in the cryo-em structures when site 1 is the apparent preferential site for ROCKET.

      We base the lipid preference of Site 1 > Site 2 on the CG MD data, where we see a higher occupancy for site 1. At the same time, as noted in the text, CDL at both sites have rather short residence times. When the protein is solubilized in detergent, these times can change, and lipids in less accessible sites (such as cavities and subunit interfaces) may be subject to a slower exchange than those that are fully exposed to the micelle (Bolla et al, Angew Chemie Int. Ed. 2020). We speculate that this effect may favor retaining a lipid at site 2. Furthermore, site 1 is flexible, with CDL attaching in various angles while site 2 has more uniform CDL orientations (see CDL density plot in Figure 1D). EM is likely biased towards the less flexible site. Notably, the density is still poorly defined, so it is possible that a more variable lipid position in site 1 would not yield a notable density at all. We have added this information to the Results section (page 5, second paragraph).

      The authors state that nMS is consistent with CDL binding preferentially to Site 1 in ROCKET and preferentially to Site 2 in the ROCKET AAXWA variant, yet it unclear from the text exactly how these experiments demonstrate this.

      As outlined in the previous answer, we base our assessment of the sites on the CG MD simulations. There, we note that CDL binds predominantly to site 1 in ROCKET and predominantly to site 2 in AAXWA, however, the overall occupancy is lower in AAXWA than in Rocket, meaning fewer lipids will be bound simultaneously in that variant. The nMS data show CDL retention by both variants when released from liposomes, but the AAXWA has lower-intensity CDL adduct peaks (Figure 2B, C). We interpret this that both have CDL sites, but in the AAXWA variant, the sites have lower occupancy. We agree that this observation does not demonstrate that the CG MD data are correct, however, it is the outcome one expects based on the simulations, so we described it as “consistent with the simulations”. We have rephrased the section to make this clear.

      As carried out for ROCKET AAXWA the total CDL binding to A61P and R66A would add to supporting information of characterisation of lipid stabilising mutations.

      We considered this possibility too. Unfortunately, the mass differences between A61P / R66A and AAXWA are slightly too high to unambiguously resolve CDL adducts of each variant, as the 1st CDL peak of AAWXA partially overlaps with the apo peak of A61P or R66A.

      Did the authors investigate a double mutation to Site 2 (e.g. R66A + M16A)?

      While designing mutants, we tested several double mutants involving the basic residues that bind the CDL headgroups (e.g. R66 + AAWXA) but found that they could not be purified, probably because a minimum of positive residues at the N-terminus is required for proper membrane insertion and folding. M16 is an interesting suggestion, but wasn’t considered because the more subtle effects of non-charged amino acids on CDL binding may be lost during desolvation (see also our response to Comment (i) from reviewer 1).

      Was the stability of R66A ever compared to the WT or only to AAXWA?

      Some of the ROCKET mutants have very similar masses that cannot be resolved well enough on the ToF instrument. While the R66-WT comparison is possible, we would not be able to compare it to R61P or D7A/S8R. To avoid three-point comparisons, we selected AAXWA as the common point of reference for all variants.

      How many CDL sites in the database used are structurally verified?

      At the time, 1KQF was the only verified E. coli protein with a CDL resolved in a high-resolution structure. The complex was predicted accurately, see Figure 6A in Corey et al (Sci Adv 2021), as were several non-E. coli complexes.

      The work on GlpG could benefit from mutagenesis or discussion of mutagenesis to this site. The Y160F mutation has already been shown to have little impact on stability or activity (Baker and Urban Nat Chem Biol. 2012).

      We thank the referee for their excellent suggestion. While Y160F did not have a pronounced effect, the other 3 positions of the predicted CDL binding site in GlpG have not been covered by Baker and Urban. Looking at sequence conservation in GlpG orthologs, manually sampling down to 50% identity (~1300 sequences in Uniprot) shows that Y160 and K167 are conserved, R92 varies between K/R/Q, whereas W98 is not conserved. The other (weak) site cited above (K132 and K191) is not conserved. A detailed investigation of how the conserved residues impact CDL binding and activity is already planned for a follow up study focusing on GlpG biology.

      Reviewer #3 (Public review):

      Summary:

      The relationships of proteins and lipids: it's complicated. This paper illustrates how cardiolipins can stabilize membrane protein subunits - and not surprisingly, positively charged residues play an important role here. But more and stronger binding of such structural lipids does not necessarily translate to stabilization of oligomeric states, since many proteins have alternative binding sites for lipids which may be intra- rather than intermolecular. Mutations which abolish primary binding sites can cause redistribution to (weaker) secondary sites which nevertheless stabilize interactions between subunits. This may be at first sight counterintuitive but actually matches expectations from structural data and MD modelling. An analogous cardiolipin binding site between subunits is found in E.coli tetrameric GlpG, with cardiolipin (thermally) stabilizing the protein against aggregation.

      “It’s complicated” We could not have phrased the main conclusions of our study better.

      Strengths:

      The use of the artificial scaffold allows testing of hypothesis about the different roles of cardiolipin binding. It reveals effects which are at first sight counterintuitive and are explained by the existence of a weaker, secondary binding site which unlike the primary one allows easy lipid-mediated interaction between two subunits of the protein. Introducing different mutations either changes the balance between primary and secondary binding sites or introduced a kink in a helix - thus affecting subunit interactions which are experimentally verified by native mass spectrometry.

      Weaknesses:

      The artificial scaffold is not necessarily reflecting the conformational dynamics and local flexibility of real, functional membrane proteins. The example of GlpG, while also showing interesting cardiolipin dependency, illustrates the case of a binding site across helices further but does not add much to the main story. It should be evident that structural lipids can be stabilizing in more than one way depending on how they bind, leading to different and possibly opposite functional outcomes.

      We share the reviewer’s concern, as we clearly observe that TMHC4_R does not have the same type of flexibility as a natural protein. We find that by introducing flexibility, we start to see CDL-mediated effects. To test the valIdity of our findings from the artificial system, we apply them to GlpG. In response to a suggestion from Reviewer 1, we compared the findings to Aac2, and found that its stabilizing CDL site closely resembles that in GlpG (see new Figure S8).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      There are a number of typos/uncorrected statements in the text.

      i) The last sentence of the Abstract appears to be an uncorrected mishmash of two.

      ii) Line 66: "protects" should be just "protect"

      iii) Line 75: Sentence appears to be incomplete. "...associated changes in protein stability." The word "stability" is missing.

      We have made these changes.

      iv) Fig. 2E. Are the magenta and blue colors inverted for variants 1 and 2?

      No, the color is correct. greater stabilization of the blue tetramer (AAXAW) compared to WT (purple) will lead to fewer blue monomoers than purple monomers in the mass spectrum.

      v) Line 274: the salt bridge should be between R8-E68.

      We have corrected this.

      vi) Lines 350-354 (final sentence of the paragraph): The sentence does not read well (especially with the double negative element). Please reconstruct the sentence and/or break it into two. 

      We have split the sentence in two.

      Suggestions:

      (i) While aromatic residues (in particular Trp) appear to be clearly involved in the CDL interaction, there is no investigation of their roles and contributions relative to the positively charged residues (R and K) investigated here. How do aromatics contribute to CDL binding and protein stability, and are they differential in nature (W vs Y vs F)?

      See our response to comment (i) from reviewer 1. In short, subtle contribution to lipid interactions (such as pi stacking with Trp or Tyr) will likely be lost during transfer to the gas phase. However, see also our response to the last comment from reviewer 2, we plan to use solution-phase activity assays to investigate the effect of Trp on CDL binding to Glp. However, this is beyond thes cope oif the current study.

      (ii) In the case of GlpG, a WR pair (W136-R137) present at the lipid-water on the periplasmic face (adjacent to helices 2/3) may function akin to the W12-R13 of ROCKET in specifically binding CDL. Investigation of this site might prove to be interesting if it indeed does.

      We added the CDL density plot for the periplasmic side to Figure S7 and discuss further sites in GlpG in the Discussion section. See response to point (ii) above for details.

      Reviewer #2 (Recommendations for the authors):

      Minor comments

      - Typo in abstract line 39-40

      - Typo in figure legend of Fig 1 line 145

      - Typo in line 149, missing R66 in residues shown as sticks description

      - Lines 165-167 could benefit from describing what residues are represented as sticks

      We have made these changes.

      - Line 263 should refer to the figure where the tetrameric state was not affected by this mutation.

      The full spectrum of the A61P mutant is not included in the figure, hence there is no reference,

      - Addition of statistics to Fig. 4F ?

      We have added significance indicators to the graph and information about the statistics to the legend.

      Reviewer #3 (Recommendations for the authors):

      Minor issues

      l39: rewrite

      We have made these changes.

      l60: provide evidence for what is presented as a general statement - cardiolipins might also regulate function without affecting oligomeric state, e.g. MgtA

      This is a good point, we have added references to two examples where CDL work without affecting oligomerization (MtgA, Weikum et al BBA 2024, and Aac2, Senoo et al, EMBO J 2024).

      l74: not every functional interaction comes with a thermal shift

      We use thermal shift as a proxy because it indicates tight interactions, even if they may not be functional. We have made this distinction clearer in the text.

      l78: this is true for electrostatic interactions such as are at play here, but not necessarily for hydrophobic ones

      l133: in what direction is the pulling force applied - the figure seems to suggest diagonally?

      The pull coordinate is defined as the distance between the centers of mass of the two helices. The direction of the pull coordinate in Cartesian coordinate space is thus not fixed.

      fig 1f, l159: "dissociating" meaning separation of subunits? the placement of the lipid within one subunit would not suggest that intermolecular interactions are properly represented here, please clarify

      The lipid placement in the schematic is not representative since the lipid occupies different spaces in WT and AAXWA, we have noted this in the legend. Regarding line 159, “Dissociation” is not strictly correct, since the measure the force to separate helix 1 and 2, i.e. unfolding. We have changed the wording to “unfolding”.

      l173: was there any evidence in EM data for monomers or smaller oligomers?

      No smaller particles were identified by visual inspection or in the particle classes. We have noted this in the methods section.

      l203: were tetramer peaks isolated separately for CID?

      C8E4 can cause some activation-dependent charge reduction, which could allow some tetramers to “sneak out” of the isolation window. We used global activation without precursor selection which subjects all ions to activation.

      fig 2c: can you indicate the 3rd lipid binding as it seems to be in the noise

      We can unambiguously assign the retention of three CDL molecules for 17+ charge state only, and clarified this in the legend to Figrue 2.

      fig3: can you pls clarify what is meant by stabilization here - less monomer in case A means a more stable oligomer, but "A > B" should lead to ratios < 50%. This does not help with understanding what "stabilization" means in panels c-f, please define what the y axis means for these. Please also explain the bottom panels (side view) in each case, what do the dots represent?

      We apologize for the oversight of not explaining the side views, we have added a legend. The schematic in panel A is correct (compare the schematic in Figure 2 E). If tetramer A (blue) is stabilized by CDL more than tetramer B  “CDL stabilization A>B”), there will be fewer monomers ejected from A. If there is less A in the presence of CDL, then the ratio of B/(B+A) will go up.

      It is not very clear what consequences the kink introduced by proline has for intra- vs. intermolecular interactions - the cartoons don't help much here

      We agree, the A61P impact on the structure is subtle. The small kink it introduces is not really visible in the top view, and hence, we tried to emphasize this in the side view. We have clarified the meaning of the side view schematics in the legend.

      l360: is that an assumption made here or is there evidence for displacement? native MS could potentially prove this.

      This is an assumption based on the fact that we see very little binding of POPG in the mixed bilayer CG-MD. We have clarified this in the text. Measuring this with MS is an interesting idea, but we have no direct measurement of displacement, since addition of CDL and POPG to the protein in detergent would result in binding to other sites as well.

      fig 4d: there is not much POPG density visible at all - why is that?

      Both plots use the same absolute scale. There is simply much less POPG binding compared to CDL.

      fig 4e: is this released protein already dissociated into monomers due to denaturation or excessive energy (CID product) - please comment.

      The CID energy for the spectrum in Figure 4E was selected to show partial dissociation and monomer release at higher voltages (220V in this case). At lower voltages (150V-170V) we do not observe dissociation in C8E4, see Figure S4A.

      l363: pls comment on the apparent discrepancy between single lipid binding and double density

      We added a clarifying sentence regarding the double lipids. The density seen in the published structure is of four lipid tails next to each other, which is what one would expect for a CDL. Since the CDL could not be resolved unambiguously, two phospholipids with two acyl chains each were modeled into the density instead. Our MS and MD data strongly suggests that the density stems from a single CDL.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public Review):

      Strengths:

      The manuscript utilizes a previously reported misfolding-prone reporter to assess its behaviour in ER in different cell line models. They make two interesting observations:

      (1) Upon prolonged incubation, the reporter accumulates in nuclear aggregates.

      (2) The aggregates are cleared during mitosis. They further provide some insight into the role of chaperones and ER stressors in aggregate clearance. These observations provide a starting point for addressing the role of mitosis in aggregate clearance. Needless to say, going ahead understanding the impact of aggregate clearance on cell division will be equally important.

      Weaknesses:

      The study almost entirely relies on an imaging approach to address the issue of aggregate clearance. A complementary biochemical approach would be more insightful. The intriguing observations pertaining to aggregates in the nucleus and their clearance during mitosis lack mechanistic understanding. The issue pertaining to the functional relevance of aggregation clearance or its lack thereof has not been addressed. Experiments addressing these issues would be a terrific addition to this manuscript.

      We have performed protein blotting and proteomics to characterize ER-FlucDM-eGFP expressing cells. We have also provided evidence to support the role of ER reorganization in regulating aggregate clearance. Our proteomic analysis provided a global view of the cellular state of cells expressing ER-FlucDM-eGFP, which potentially revealed functional relevance of ER-FlucDM-eGFP. Details are explained in the following comments. 

      Reviewer #2 (Public Review):

      Summary:

      The authors provide an interesting observation that ER-targeted excess misfolded proteins localize to the nucleus within membrane-entrapped vesicles for further quality control during cell division. This is useful information indicating transient nuclear compartmentalization as a quality control strategy for misfolded ER proteins in mitotic cells, although endogenous substrates of this pathway are yet to be identified.

      Strengths:

      This microscopy-based study reports unique membrane-based compartments of ERtargeted misfolded proteins within the nucleus. Quarantining aggregating proteins in membrane-less compartments is a widely accepted protein quality control mechanism. This work highlights the importance of membrane-bound quarantining strategies for aggregating proteins. These observations open up multiple questions on proteostasis biology. How do these membrane-bound bodies enter the nucleus? How are the singlelayer membranes formed? How exactly are these membrane-bound aggregates degraded? Are similar membrane-bound nuclear deposits present in post-mitotic cells that are relevant in age-related proteostasis diseases? Etc. Thus, the observations reported here are potentially interesting.

      Weaknesses:

      This study, like many other studies, used a set of model misfolding-prone proteins to uncover the interesting nuclear-compartment-based quality control of ER proteins. The endogenous ER-proteins that reach a similar stage of overdose of misfolding during ER stress remain unknown.

      We have included a previous study that showed accumulation of BiP aggregates in the nucleus upon overexpression of BiP (Morris et al., 1997; DOI: 10.1074/jbc.272.7.4327) in the discussion (Line 299).

      The mechanism of disaggregation of membrane-trapped misfolded proteins is unclear. Do these come out of the membrane traps? The authors report a few vesicles in living cells. This may suggest that membrane-untrapped proteins are disaggregated while trapped proteins remain aggregates within membranes.

      We initially made mStayGold-Sec61β to image the ER structures and ER-FlucDM-eGFP aggregates. However, we could not obtain convincing time-lapse images to show the release of ER-FlucDM-eGFP aggregates from the ER membrane as there are abundant ER structures present close to the aggregates during mitosis, preventing the differentiation of the membrane encapsulating aggregates from the ER structures. 

      The authors figure out the involvement of proteasome and Hsp70 during the disaggregation process. However, the detailed mechanisms including the ubiquitin ligases are not identified. Also, is the protein ubiquitinated at this stage?

      We performed cycloheximide chase experiments in cells released from the G2/M and found that ER-FlucDM-eGFP protein level did not fluctuate significantly when cells progressed through mitosis and cytokinesis. Thus, we did not consider protein ubiquitination and degradation of ER-FlucDM-eGFP as a major mechanism for its clearance. We have included this observation in the results (Figure S7A; Line 266) and in the discussion (Line 324) of the revised manuscript.

      This paper suffers from a lack of cellular biochemistry. Western blots confirming the solubility and insolubility of the misfolded proteins are required. This will also help to calculate the specific activity of luciferase more accurately than estimating the fluorescence intensities of soluble and aggregated/compartmentalized proteins. 

      We performed solubility test in cells expressing ER-FlucDM-eGFP and detected insoluble ERFlucDM-eGFP after heat stress (Figure S1E; Line 102). We have also performed protein blotting to detect ER-FlucDM-eGFP to normalize the luciferase activity (Line 609). We have updated the method section for luciferase measurement (Line 494).   

      Microscopy suggested the dissolution of the membrane-based compartments and probably disaggregation of the protein. This data should be substantiated using Western blots. Degradation can only be confirmed by Western blots. The authors should try time course experiments to correlate with microscopy data. Cycloheximide chase experiments will be useful.

      We performed cycloheximide chase experiments in cells released from the G2/M and found that ER-FlucDM-eGFP protein level did not fluctuate significantly when cells progressed through mitosis and cytokinesis (Figure S7A to S7C). Also, live-cell imaging of cells released from the G2/M indicated no significant change of total fluorescence intensity of ER-FlucDMeGFP (Figure S7D). Thus, we do not think that protein degradation of ER-FlucDM-eGFP is the major mechanism for its clearance. 

      The cell models express the ER-targeted misfolded proteins constitutively that may already reprogram the proteostasis. The authors may try one experiment with inducible overexpression.

      We have re-transduced fresh MCF10A cells with lentiviral particles to induce expression of ER-FlucDM-eGFP. The aggregates started to form after 24 h post-transduction. We made similar observations as described in the manuscript (e.g. aggregate clearance) two days after re-transduction.

      It is clear that a saturating dose of ER-targeted misfolded proteins activates the pathway.

      The authors performed a few RT-PCR experiments to indicate the proteostasis-sensitivity.

      Proteome-based experiments will be better to substantiate proteostasis saturation.

      We have performed proteomic analysis in cells expressing ER-FlucDM-eGFP and observed up-regulation of multiple proteins involved in the ER stress response, indicating that cells expressing ER-FlucDM-eGFP experience proteostatic stress (Figure S4A; Line 179).  

      The authors should immunostain the nuclear compartments for other ER-membrane resident proteins that span either the bilayer or a single layer. The data may be discussed.

      We have co-expressed ER-FlucDM-mCherry and mStayGold-Sec61β and detected mStayGold- Sec61β around ER-FlucDM-mCherry aggregates (Figure 1B).  

      All microscopy figures should include control cells with similarly aggregating proteins or without aggregates as appropriate. For example, is the nuclear-targeted FlucDM-EGFP similarly entrapped? A control experiment will be interesting. Expression of control proteins should be estimated by western blots.

      We targeted FlucDM-eGFP to the nucleus by expressing NLS-FlucDM-eGFP (Figure S1A). We found that the nuclear FlucDM-eGFP did not co-localize with the ER-FlucDM-mCherry aggregates (Figure S1B; Line 96). We have also determined the expression levels of NLSFlucDM-eGFP and ER-FlucDM-mCherry (Figure S1C and S1D).

      There are few more points that may be out of the scope of the manuscript. For example, how do these compartments enter the nucleus? Whether similar entry mechanisms/events are ever reported? What do the authors speculate? Also, the bilayer membrane becomes a single layer. This is potentially interesting and should be discussed with probable mechanisms. Also, do these nuclear compartments interfere with transcription and thereby deregulate cell division? What about post-mitotic cells? Similar deposits may be potentially toxic in the absence of cell division. All these may be discussed.

      Thank you for interesting suggestions for our study. We speculated that ER-FlucDM-eGFP aggregates may derive from the invagination of the inner nuclear membrane given that the aggregates are in close proximity to the inner nuclear membrane in interpase cells (Line 299). We have included a previous study that reported a similar aggregate upon BiP overexpression (Morris et al., 1997; DOI: 10.1074/jbc.272.7.4327; Line 300). Our proteomic analysis showed that cells expressing ER-FlucDM-eGFP have several up-regulated proteins related to cell cycle regulation (Figure S4A; Line 346).  

      Reviewer #3 (Public Review):

      Summary:

      This paper describes a new mechanism of clearance of protein aggregates occurring during mitosis.

      The authors have observed that animal cells can clear misfolded aggregated proteins at the end of mitosis. The images and data gathered are solid, convincing, and statistically significant. However, there is a lack of insight into the underlying mechanism. They show the involvement of the ER, ATPase-dependent, BiP chaperone, and the requirement of Cdk1 inactivation (a hallmark of mitotic exit) in the process. They also show that the mechanism seems to be independent of the APC/C complex (anaphase-promoting complex). Several points need to be clarified regarding the mechanism that clears the aggregates during mitosis:

      • What happens in the cell substructure during mitosis to explain the recruitment of BiP towards the aggregates, which seem to be relocated to the cytoplasm surrounded by the ER membrane.

      We have included images to show that BiP co-localizes with ER-FlucDM-eGFP aggregates in interphase cells (Figure S5C). We think that BiP participates in the formation of ER-FlucDMeGFP during interphase instead of getting recruited to the aggregates during mitosis.  

      • How the changes in the cell substructure during mitosis explain the relocation of protein aggregates during mitosis.

      We provided evidence to show that clearance of ER-FlucDM-eGFP aggregates involves the ER remodeling process. We depleted ER membrane fusion proteins ATL2 and ATL3 to perturb the distribution of ER sheets or tubules and found that cells were defective in clearing the aggregates (Figure 7A and B; Line 278). 

      • Why BiP seems to be the main player of this mechanism and not the cyto Hsp70 first described to be involved in protein disaggregation.

      In our proteomic analysis, we found that BiP (HSPA5) but not other Hsp70 family members were up-regulated in cells expressing ER-FlucDM-eGFP (Line 352; Figure S4A). This explains why BiP is the main player of the ER-FlucDM-eGFP aggregate clearance.  

      Strengths:

      Experimental data showing clearance of protein aggregates during mitosis is solid, statistically significant, and very interesting.

      Weaknesses:

      Weak mechanistic insight to explain the process of protein disaggregation, particularly the interconnection between what happens in the cell substructure during mitosis to trigger and drive clearance of protein aggregates.

      In our revised manuscript, we now provided evidence to show that ER-FlucDM-eGFP aggregate clearance involved remodeling of the ER structures during mitotic exit. This is added as a new Figure 7 in the revised manuscript and is described in the result section (Line 278) and in the discussion section (Line 323). We believe that this addition has provided mechanistic insights into ER-FlucDM-eGFP aggregate clearance.

      Recommendations for the authors:

      Reviewing Editor comments:

      I have read these reviews in detail and would like to recommend that the authors perform the experiments according to the reviewers' suggestions, as well as provide the appropriate controls raised by the reviewers.

      I think there are not that many requests and they all seem very reasonable and easily doable. I would recommend that the authors carry out the suggested experiments to develop a stronger story where the evidence transitions from being incomplete presently to a "more complete" standard.

      We have addressed questions raised by three reviewers and updated our manuscript (labeled in red in the main text).

      Reviewer #1 (Recommendations For The Authors):

      The manuscript makes exciting observations about the accumulation of reporter protein aggregates in the nucleus and its clearance during mitosis. It also provides some insight into the role of chaperons in aggregate clearance. These observations provide a good platform to perform in-depth analysis of the underlying mechanism and its functional relevance which perhaps the authors will plan over the long term. However, the below suggestions will help improve the current version of the manuscript:

      (1) Although it is assumed that the aggregates are cleared by the protein degradation mechanism, clear evidence supporting this assumption in the author's experiments is lacking and needs to be provided. Is it possible that mitosis induces disassembly of these aggregates instead of degradation?

      We performed two experiments to verify whether ER-FlucDM-eGFP aggregates are cleared by the protein degradation mechanism. In the first experiment, we treated cells expressing ER-FlucDM-eGFP released from the G2/M boundary with cycloheximide (CHX) and found that ER-FlucDM-eGFP did not decrease in protein abundance in cells progressing through mitosis (Figure S7A to S7C). In the second experiment, we measured the intensity of ERFlucDM-eGFP in early dividing cells and late dividing cells after release from the G2/M boundary and found that there was no significant difference between early and late dividing cells (Figure S7D). Thus, we concluded that protein degradation of ER-FlucDM-eGFP is not the primary mechanism of its clearance during cell division (Line 324). Furthermore, we included new data to show that the ER-FlucDM-eGFP aggregate clearance depends on ER reorganization during cell division, so mitotic exit induces disassembly of the aggregates instead of protein degradation.

      (2) It is intriguing that the aggregates are nuclear. Is the nuclear localization mediated by localization to ER? A time course analysis would reveal this and would provide credence to the idea that the reporter was originally expressed in the ER. It is currently unclear if the reporter ever gets expressed in ER.

      We showed that in interphase cells, ER-FlucDM-eGFP co-localizes with mStayGold-Sec61β, which labels the ER structures (Figure 1B). So, ER-FlucDM-eGFP is expressed and present in the ER network and invaginates into the inner nuclear membrane as aggregates. We attempted to image ER-FlucDM-eGFP for its formation; however it was technically challenging as the aggregates appeared very small and not too visible after clearance under our microscopy system.  

      (3) It would be expected that the persistence of these aggregates would impact cell division and cellular health. An experiment addressing this hypothesis would be very useful in establishing the functional relevance of this observation in the context of the current study.

      We have performed proteomic analysis on cell expressing ER-FlucDM-eGFP and found that multiple proteins involved in the ER stress response were up-regulated (Figure S4A). Additionally, proteins related to cell cycle regulation were up-regulated upon expression of ER-FlucDM-eGFP (Figure S4A). The increase of these proteins may indicate a perturbed cellular health (Line 344). 

      (4) A recent report (PMID: 34467852) identified the role of ER tubules in controlling the size of certain misfolded condensates. Would specific ER substructures affect the nuclear localization and/or clearance of the FlucDM aggregates? This is tied to point#2 and would provide insights into the connection between ER and the nuclear aggregates.

      Thank you for your suggestions. We perturbed the ER remodeling process by knocking down ATL2 and ATL3, which are ER membrane fusion proteins, and found that clearance of ER-FlucDM-eGFP aggregates was affected (Figure 7A and B). Hence, perturbation of the distribution of ER tubules and ER sheets affects ER-FlucDM-eGFP aggregate clearance. We have also added the recent paper about ER tubule size in regulating the sizes of misfolded condensates in the discussion (Line 321)

      Reviewer #2 (Recommendations For The Authors):

      I expect that the images indicate z-sections. Should be indicated in legends as applicable.

      We have indicated whether the images are Z-stack or single Z-slices in the figure legends.  

      Small point: the control region (outside inclusion) that was bleached in 2c may be clearly indicated. 

      We have added the explanation in the figure legend of Figure 2C.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the neuroprotective effect of reserpine in a retinitis pigmentosa (P23H-1) model, characterized by a mutation in the rhodopsin gene. Their results reveal that female rats show better preservation of both rod and cone photoreceptors following reserpine treatment compared to males.

      Strengths:

      This study effectively highlights the neuroprotective potential of reserpine and underscores the value of drug repositioning as a strategy for accelerating the development of effective treatments. The findings are significant for their clinical implications, particularly in demonstrating sex-specific differences in therapeutic response.

      We sincerely appreciate the reviewer’s comments.

      Weaknesses:

      The main limitation is the lack of precise identification of the specific pathway through which reserpine prevents photoreceptor death.

      We acknowledge that the exact pathway through which reserpine exerts its protective effects on photoreceptors remains undetermined, yet our findings provide critical insights into potential mechanisms. Together with our previous report [PMID: 36975211], the studies being presented here validate proteostasis (including autophagy) and p53 signaling as the key pathways underlying reserpine-mediated survival of photoreceptors in retinal disease models. We also go a step further by showing an influence of the biological sex.

      We emphasize that the primary aim of this study was to demonstrate the effectiveness of reserpine in a different retinal degeneration model—specifically, the autosomal dominant RP model—which shares a retinal disease phenotype with the model used for initial screening but involves different genetic and molecular mechanisms of degeneration.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "Sex-specific attenuation of photoreceptor degeneration by reserpine in a rhodopsin P23H rat model of autosomal dominant retinitis pigmentosa" by Beom Song et al., the authors explore the transcriptomic differences between male and female wild-type (WT) and P23H retinas, highlighting significant gene expression variations and sex-specific trends. The study emphasizes the importance of considering biological sex in understanding inherited retinal degeneration and the impact of drug treatments on mutant retinas.

      Strengths:

      (1) Relevance to Clinical Challenges: The study addresses a critical limitation in inherited retinal degeneration (IRD) therapies by exploring a gene-agnostic approach. It emphasizes sex-specific responses, which aligns with recent NIH mandates on sex as a biological variable.

      (2) Multi-dimensional Methodology: Combining electroretinography (ERG), optical coherence tomography (OCT), histology, and transcriptomics strengthens the study's findings.

      (3) Novel Insights: The transcriptomic analysis uncovers sex-specific pathways impacted by reserpine, laying the foundation for personalized approaches to retinal disease therapy.

      We are grateful for highlighting the strengths of our work.

      Weaknesses:

      Dose Optimization

      The study uses a fixed dose (40 µM), but no dose-response analysis is provided. Sex-specific differences in efficacy might be influenced by suboptimal dosing, particularly considering potential differences in metabolism or drug distribution.

      We acknowledge the limitation of using a fixed dose (40 µM) of reserpine in this study without conducting a comprehensive dose-response analysis. In the primary screens, the EC<sub>50</sub> of reserpine was approximately 20 µM. We doubled the concentration for injection to account for the potential loss of reserpine during the in vivo procedures. As we observed the rescue effect of reserpine in mice, we used the same concentration for rats. The fixed-dose approach was chosen to maintain consistency with previous studies evaluating reserpine in retinal degeneration models and to facilitate comparison across studies. Efforts to identify optimal dosing were deprioritized, as the primary goal was different and this information cannot be directly translated to clinical applications.

      We also agree that sex-specific differences in efficacy might be influenced by suboptimal dosing, particularly given potential variations in metabolism, drug distribution, and pharmacokinetics between male and female rats. However, recent pharmacokinetic studies on systemically administered reserpine in rats reported no statistically significant covariates, including body weight, age, breed, or sex, affecting pharmacokinetic (PK) or pharmacodynamic (PD) parameters (Alfosea-Cuadrado, G. M., Zarzoso-Foj, J., Adell, A., Valverde-Navarro, A. A., González-Soler, E. M., Mangas-Sanjuán, V., & Blasco-Serra, A. (2024). Population Pharmacokinetic–Pharmacodynamic Analysis of a Reserpine-Induced Myalgia Model in Rats. Pharmaceutics, 16(8), 1101. https://doi.org/10.3390/pharmaceutics16081101). Furthermore, no evidence of sex-specific differences in reserpine pharmacokinetics has been previously identified in available databases (National Center for Biotechnology Information (2025). PubChem Compound Summary for CID 5770, Reserpine. Retrieved January 13, 2025 from https://pubchem.ncbi.nlm.nih.gov/compound/Reserpine). Importantly, the drug in this study was administered intravitreally, where the ocular compartments are relatively isolated from systemic metabolism or excretion. Under these conditions, where absorption, distribution, metabolism, and excretion have minimal impact, we observed sex differences in efficacy using the same dose of drug.

      Nonetheless, we agree with the reviewer and plan to pursue dose-response and other studies in future investigations.

      Statistical Analysis

      In my opinion, there is room for improvement. How were the animals injected? Was the contralateral eye used as control? (no information in the manuscript about it!, line 390 just mentions the volume and concentration of injections). If so, why not use parametric paired analysis? Why use a non-parametric test, as it is the Mann-Whitney U? The Mann-Whitney U test is usually employed for discontinuous count data; is that the case here?<br /> Therefore, please specify whether contralateral eyes or independent groups served as controls. If contralateral controls were used, paired parametric tests (e.g., paired t-tests) would be statistically appropriate. Alternatively, if independent cohorts were used, non-parametric Mann-Whitney U tests may suffice but require clear justification.

      We apologize for the lack of clarity. In line 124, we described the injection as “bilateral intravitreal injections of 5 µL of either vehicle or 40 µM reserpine,” and in Figure 1A, we annotated the bilateral injection as DMSO for both eyes and RSP for both eyes. To address this uncertainty, we added the clarification, “with each group receiving bilateral injections of either vehicle or reserpine” (lines 404–405). Since the results are not paired and involve continuous data for which the normality assumption cannot be confidently met or verified, we used the Mann-Whitney U test for statistical analysis.

      Sex-Specific Pathways

      The authors do identify pathways enriched in female vs. male retinas but fail to explicitly connect these to the changes in phenotype analysed by ERG and OCT. The lack of mechanistic validation weakens the argument.

      The study does not explore why female rats respond better to reserpine. Potential factors such as hormonal differences, retinal size, or differential drug uptake are not discussed.

      It remains open, whether observed transcriptomic trends (e.g., proteostasis network genes) correlate with sex-specific functional outcomes.

      We acknowledge that, while we identified pathways enriched in female versus male retinas, we did not explicitly connect these findings to the functional phenotypes measured by ERG and OCT. Although our transcriptomic data suggest that reserpine differentially influences pathways such as proteostasis and p53 signaling, we did not conduct mechanistic experiments to validate a causal relationship between these pathways and the observed outcomes.

      In practice, designing a study to validate the mechanisms of a small molecule modulating multiple pathways presents significant challenges. If the pathways cannot be specifically modulated or if modulation could result in irreversible outcomes, the mechanistic validation becomes difficult to achieve. Drugs demonstrating mutation-agnostic efficacy are often investigated primarily through outcome measures and the analysis of affected pathways rather than through direct mechanistic validation (Leinonen, H., Zhang, J., Occelli, L. M., Seemab, U., Choi, E. H., L P Marinho, L. F., Querubin, J., Kolesnikov, A. V., Galinska, A., Kordecka, K., Hoang, T., Lewandowski, D., Lee, T. T., Einstein, E. E., Einstein, D. E., Dong, Z., Kiser, P. D., Blackshaw, S., Kefalov, V. J., Tabaka, M., … Palczewski, K. (2024). A combination treatment based on drug repurposing demonstrates mutation-agnostic efficacy in pre-clinical retinopathy models. Nature communications, 15(1), 5943. https://doi.org/10.1038/s41467-024-50033-5).

      As recommended, we added potential factors that might influence the differential response to reserpine, based on other studies (lines 353–362) highlighting differences in dopamine storage capacity and estrogen independence. We also added a discussion on the possibility of sex-related differences in basal ERG response levels (lines 363–366).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study presents compelling findings on the neuroprotective effects of reserpine in a well-established model of retinitis pigmentosa (P23H-1). The use of ERG, optomotor assays, OCT, immunohistochemistry, and transcriptomic techniques provides a good exploration of the treatment's effects, particularly highlighting the differential response in females. The study underscores the potential of drug repurposing to expedite the availability of therapeutic interventions for patients.

      Thanks for your generous comments.

      While the manuscript presents an important contribution, I would like to highlight a few points that need clarification or further elaboration to strengthen the work:

      (1) Please include the photopic a-wave data in your analysis or provide a justification for its omission. Specifically, it would be valuable to know whether there is an improvement in this parameter under reserpine treatment.

      We appreciate the reviewer’s suggestion to include photopic a-wave data in our analysis and acknowledge the importance of this parameter in evaluating cone photoreceptor function. However, we did not analyze the photopic a-wave amplitude in our study because we found the photopic a-wave has low amplitude and high variability, consistent with findings in other studies with P23H-1 rats (Orhan E, Dalkara D, Neuillé M, Lechauve C, Michiels C, et al. (2015) Genotypic and Phenotypic Characterization of P23H Line 1 Rat Model. PLOS ONE 10(5): e0127319. https://doi.org/10.1371/journal.pone.0127319) or even with wild type rats (V.L. Fonteille, J. Racine, S. Joly, A.L. Dorfman, S. Rosolen, P. Lachapelle; Do Rats Generate a Photopic a–Wave? . Invest. Ophthalmol. Vis. Sci. 2005;46(13):2246). We added the description (lines 435-437) explaining why the photopic a-wave was not analyzed. Studies with P23H-1 did not analyze the photopic a-wave, probably for similar reasons.

      (2) In Figure 1, it would be helpful to include data from normal control animals to provide a benchmark for retinal degeneration in P23H-1 animals and to better contextualize the effects of reserpine treatment.

      Thanks. As suggested, we have included data from normal control animals to Figure 1.

      (3) The manuscript states that "Treated female retinas have significantly higher expression of the gene for P62 (SQSTM1), indicating a potential key route for reserpine's activity" (Line 331). Please explain how this difference in expression might translate into a better photoreceptor response in females compared to males.

      The difference in P62 (SQSTM1) expression between treated female and male retinas could have important implications for the photoreceptor response. We have identified in our previous study that reserpine increased P62 that mediates proteome balance between ubiquitin-proteasome system (UPS) and autophagy. Together with the role of P62 in the regulation of oxidative stress, P62 might be important for photoreceptor survival and function. Higher expression of P62 in treated females could suggest more efficient cellular maintenance and a better ability to cope with stress, leading to improved photoreceptor survival and function.

      (4) Numerous studies have shown that animal models of Parkinson's disease (e.g., those treated with MPTP or rotenone) or retinal tissue from Parkinson's patients exhibit dopaminergic cell death and associated vision loss. Please discuss how these findings relate to your results. Can you hypothesize how dopamine depletion by reserpine may lead to improved photoreceptor responses in your model?

      We appreciate the reviewer’s insightful comments. Both MPTP and rotenone act via inhibition of complex I of the respiratory chain, causing cell death and leading to dopamine depletion. In contrast, reserpine acts by inhibiting the vesicular monoamine transporter, depleting catecholamines by preventing their storage and facilitating their metabolism by monoamine oxidase. Although reserpine and other agents can induce animal models of Parkinson's disease, reserpine differs from the others in several aspects: (i) reserpine do not induce neurodegeneration and protein aggregation; (ii) motor performance, monoamine content, and TH staining are partially restored after treatment interruption; and (iii) reserpine lacks specificity regarding dopaminergic neurotransmission (Leão, A. H., Sarmento-Silva, A. J., Santos, J. R., Ribeiro, A. M., & Silva, R. H. (2015). Molecular, Neurochemical, and Behavioral Hallmarks of Reserpine as a Model for Parkinson's Disease: New Perspectives to a Long-Standing Model. Brain pathology (Zurich, Switzerland), 25(4), 377–390. https://doi.org/10.1111/bpa.12253). We have discussed the various effects of catecholamine depletion on retinal diseases (lines 331–337). Both dopamine receptor antagonists and agonists, as well as catecholamine depletion, can exert protective effects on the retina. The reduction in scotopic b-wave amplitude observed at P54, followed by a lack of further progression in degeneration, may support the hypothesis that reduced neuronal activity due to catecholamine depletion could have mitigated damage to retinal neurons.

      (5) For readers who may not be familiar with the P23H-1 mutation, it would be beneficial to include a brief description of the timeline and progression of retinal degeneration in this model.

      As the progression varies among studies, we have provided our description on observations from the same facility where the animals were housed. The timeline and progression of retinal degeneration are briefly described in the results section (lines 112–115) and Supplementary Figure 1.

      (6) Do you have any data on the effects of reserpine treatment in older animals? If available, this could provide additional insight into the potential applicability of reserpine in later stages of disease progression.

      Unfortunately, we do not have data from older animals. As described in the results section (lines 116–124), we set the timepoint for interventions before functional impairment peaked, aiming to harness the remaining potential for rescue and promote functional improvement. Our approach focused on developing a gene-agnostic therapy that can delay disease progression and be delivered at an earlier stage than AAV-based therapies, using FDA-approved drugs.

      (7) Molecular Basis of Sex Differences: The molecular mechanisms underlying the differential responses in males and females should be elaborated upon. If possible, include a discussion or hypothesis that addresses these sex-specific differences at the molecular level.

      We thank the reviewer for highlighting the importance of addressing the molecular basis of sex-specific differences. In our study, we observed distinct transcriptomic responses to reserpine between male and female rats, particularly in molecular pathways related to proteostasis and p53 signaling. While the sex-specific differences in these molecular pathways remain to be fully evaluated, we have added a discussion on sex differences in reserpine responses, incorporating findings from other studies (lines 353–366).

      Reviewer #2 (Recommendations for the authors):

      (1) There is no mention in the manuscript about the fact that the transgene rats have several copies of rhodopsin and how this can affect these sex differences. Would it be the same in the P23H KO mouse? Or in other models with a single copy of the mutation?

      We have described in the Materials and Methods section how they were bred, but we did not specifically mention the allele status in the manuscript. Hemizygous P23H-1 rats used in this study carry a single P23H transgene allele with a transgene copy number of 9, in addition to the normal two wild-type opsin alleles. We added this description to clear the uncertainty (lines 384-387.

      (2) This sentence: in abstract lines 26 to 29: "Recently, we identified reserpine as a lead molecule for maintaining rod survival in mouse and human retinal organoids as well as in the rd16 mouse, which phenocopy Leber congenital amaurosis caused by mutations in the cilia-centrosomal gene CEP290 (Chen et al. eLife 2023;12:e83205. DOI: https://doi.org/10.7554/eLife.83205)", to my vew, does not belong to the abstract, maybe in the introduction as stage of art.

      Thank you for asking. According to the guidelines for the research advance articles (that follow previously published studies), a reference to the original eLife article should be included in the abstract. As specified in the guidelines, we have updated the citation format to (author, year) for referencing eLife articles (line 29).

      (3) Lines 167-170: "Histologic evaluation of the retinas also demonstrated more prominent ONL thinning in the dorsal retina and increased ONL thickness in the dorsal retina measured at 1,000, 1,250, and 1,500 µm distant from the optic nerve head in reserpine-treated group compared with control group (Figure 3C)". I do not understand this sentence. Is it a more prominent thinning or an increased thickness?

      We apologize for the confusion caused by this sentence. The histological evaluation showed that ONL thinning was more pronounced in the dorsal retina of control group, which was consistent with OCT findings in Figure 3A. Reserpine treatment increased the ONL thickness in the dorsal retina at specific distances from the optic nerve head (1,000, 1,250, and 1,500 µm). We have revised the sentence for clarity (lines 165-168).

      (4) Lines 182-185 and Figure 4B: FL is not the best approach to quantify rhodopsin levels. Since the DAPI staining is overexposed, it is hard to evaluate the staining of RHO in the ONL. From the visible staining in the OS, it is only possible to affirm that the OS are longer in RSP-treated retinas... more is not to be affirmed based on these figures. I suggest using WB.

      We acknowledge the reviewer’s concern regarding the use of fluorescence imaging to quantify rhodopsin levels. While our current data highlight structural preservation, such as the length of the outer segments, we agree that drawing conclusions about rhodopsin levels from fluorescence staining is limited. As we do not have samples for WB and fluorescence imaging cannot quantify rhodopsin, we have revised the description (lines 180-184).

      (5) Lines 188-190 and Figure 4C: The images in 4C showed an extreme divergence between treated and untreated retina concerning the amount of stained cones, which is not observed at the quantification at 1000µm statistic. Are the images not representative?

      We agree with the reviewer that the images in Figure 4C may not adequately represent the quantified data. To address this, we have changed the figure to reflect the quantification results accurately.

      (6) Figures 6C-6D and 6G. Why do the authors not use any statistical analysis? Or are the differences not statistically significant? Why do authors use only WT and DMSO controls? What about untreated P23H controls (no DMSO)?

      Thanks for checking, and we apologize for the oversight. We have updated figures 5, 6 and S5 to include adjusted p-value in relevant plots. In addition, details of significance threshold are available in supplementary tables. Regarding controls, untreated P23H retinas (without DMSO) were not included in the current analysis, as our experience shows that DMSO injection itself does not cause functional or structural changes. The key data demonstrating the effect of reserpine involve a comparison between the group treated with reserpine and the control group treated with DMSO, as the only difference between these groups is the involvement of the drug.

      (7) Validation of findings by testing key genes (e.g., p62/SQSTM1, Nrf2) using qPCR or immunohistochemistry will strengthen the findings.

      We appreciate the reviewer’s suggestion to validate key findings using qPCR or immunohistochemistry, as such experiments are crucial for further strengthening our conclusions. While this was not feasible in the current study due to various constraints, we fully recognize their importance and plan to incorporate these in our follow-up studies.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Response to Public Reviews:

      We would like to thank the reviewers and editors once more for their time and effort in reviewing our manuscript. Below we discuss specifically our response to the recommendations of Reviewer 2, which were the only substantial changes we made to the manuscript.

      Reviewer 2 recommendation:

      "My only remaining suggestion is that the authors acknowledge and cite the work of other groups which have similarly found different subsets of LADs based on various molecular/epigenetic features:

      (1) doi.org/10.1101/2024.12.20.629719

      (2) PMID: 25995381

      (3) PMID: 36691074

      (4) PMID: 23124521 (fLADs versus cLADs, as described by the authors themselves) The exact subtypes of LADs might be different based on the features examined, but others have found/implicated the existence of different types of LADs. Hence, the pwv-LAD should be contextualized within these findings (which they do relative to v-fiLADs)."

      We thank the reviewer for this suggestion and for these references. We think that the best place to go into depth about how our work relates to these references would be in an appropriate review article.

      However, we did read these references carefully and responded, as described below, by adding additional clarifying text in the manuscript as well as mention of articles specifically relevant to our description of our results.

      (1) Reviewer 2 wrote specifically, "Hence, the pwv-LAD should be contextualized within these findings (which they do relative to v-fiLADs)"

      We are not sure exactly what Reviewer 2 means here. In this manuscript we defined p-w-v iLADs, not LADs. So, it would be inappropriate to compare a subset of iLAD regions with different types of LADs.

      If this was the meaning of Reviewer 2, then other readers might have similar confusion. Therefore, we added the following clarifying text in red:

      "Several previous studies have used varying approaches to subdivide LADs further into distinct subsets of LADs with different biochemical and/or functional properties (Martin et al., 2024; Meuleman et al., 2013; Shah et al., 2023; Zheng et al., 2015). However, in this Section we focused instead on asking whether regions specifically within iLADs might show differential localization relative to the lamina and/or nucleoli and, if so, whether these regions would show different levels of gene expression. More specifically, analogously to how gene expression hot-zones appeared as local maxima in speckle TSA-seq with early DNA replication timing, we asked whether iLAD regions that appeared as local maxima in lamina proximity mapping signals would correspond to iLAD regions with locally reduced gene expression levels and later DNA replication timing relative to their flanking iLAD sequences. Our rationale was that these iLAD regions might represent chromatin domains that together with their flanking iLAD regions would typically localize well within the nuclear interior but in a fraction of the cell population would loop back and attach at the nuclear periphery."

      (2) We also added the following text near the end of the section about p-w-v iLADs to place them in the context of one class of "LADs" identified by ChIP-seq rather than DamID. We use quotation marks since the approach used produced a segmentation that included a nearly 50/50 mix of iLAD and LAD regions, as identified by DamID, for this class of domains.

      "We note that in a previous study a three-state Hidden Markov Model (HMM) segmented lamin B ChIP-seq data into two chromatin domain states with extensive overlap with LADs defined by lamina DamID (Shah et al., 2023). Whereas the late replicating, low gene density/expression "T1 LAD" state showed very high overlap (98%) with LADs defined by DamID, the intermediate replicating, intermediate gene expression "T2 LAD" state showed only 47% overlap with LADs defined by DamID. This was partly a result of the HMM segmentation algorithm but also due to substantial differences between the lamina ChIPseq versus DamID signals for reasons that remain unclear. The subset of p-w-v iLADs included in T2 comprise only a small percentage of the total T2 LAD coverage, which includes both other iLAD and LAD regions. Thus, the p-w-v iLADs we identified here represent a novel and distinct class of iLAD chromatin domains, not previously described."

      (3) Alternatively, what Reviewer 2 might be suggesting implicitly is that we should start with the regions identified as p-w-v iLADs in one cell type and then identify all of those p-w-v iLADs which instead exist as LADs in a second cell type. Once we have identified their LAD equivalents in a second cell type we could then ask whether they possess special characteristics such that they correspond to a specific type of LAD subset. Finally, we could then ask how that specific type of LAD subset compared to the different subtypes of LADs identified by other groups and, in particular, the references Reviewer 2 provided.

      We agree that would be an interesting future direction, but we consider that as outside the scope of this current manuscript. We note that we did no such analysis of the characteristics of LADs which existed as p-w-v iLADs in a different cell line. We save that for a possible future analysis, ideally in the same cell types as used in the cited references to allow a more direct comparison.

      (4) Finally, we added text in the Discussion that relates our analysis of the differential SON and LMNB1 TSA-seq signals for different LAD regions, and how these correlate with different histone modifications, with results from the recent preprint cited by Reviewer 2. Note that we could not directly correlate our results from human cells with the three classes of LADs described in MEFs by this preprint.

      "Fourth, we show how LAD regions showing different histone marks- either enriched in H3K9me3, H3K9me2 plus H2A.Z, H3K27me3, or none of these marks- can differentially segregate within nuclei. These results support the previous suggestion of different "flavors" of LAD regions, based on the sensitivity of the autonomous targeting of BAC transgenes to the lamina to different histone methyltransferases (Bian et al., 2013). Differential nuclear localization also was recently inferred by the appearance of different Hi-C Bsubcompartments, which similarly were differentially enriched in either H3K9m3, H3K27me3, or the combination of H3K9me2 and H2A.Z (Spracklin et al., 2023). More recently, and while this paper was in revision, a new study described segmenting mouse embryonic fibroblast LADs into three clusters using histone modification profiling (Martin et al., 2024). Interestingly, these three LAD clusters also most notably differed by their dominant enrichment of either H3K9me3, H3K9me2, or H3K27me3. Thus, three orthogonal approaches have converged on identifying different LAD regions showing differential enrichment either of H3K9me3, H3K9me2, or H3K27me3. Here, our use of TSA-seq directly measures and assigns the intranuclear localization of these different LAD regions to different nuclear locales."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Gray and colleagues describe the identification of Integrator complex subunit 12 (INTS12) as a contributor to HIV latency in two different cell lines and in cells isolated from the blood of people living with HIV. The authors employed a high-throughput CRISPR screening strategy to knock down genes and assess their relevance in maintaining HIV latency. They had used a similar approach in two previous studies, finding genes required for latency reactivation or genes preventing it and whose knockdown could enhance the latency-reactivating effect of the NFκB activator AZD5582. This work builds on the latter approach by testing the ability of gene knockdowns to complement the latency-reactivating effects of AZD5582 in combination with the BET inhibitor I-BET151. This drug combination was selected because it has been previously shown to display synergistic effects on latency reactivation.

      The finding that INTS12 may play a role in HIV latency is novel, and the effect of its knockdown in inducing HIV transcription in primary cells, albeit in only a subset of donors, is intriguing. However, there are some data and clarifications that would be important to include to complement the information provided in the current version of the manuscript.

      We have now added the requested data and clarifications. In particular, we show that knockout of INTS12 has no effect on cell proliferation (new data added in Figure 2—figure supplement 3)), we clarify how the degree of knockout and the complementation were accomplished, we clarify the differences between the RNA-seq and the activation scores, and we have bolstered the claim that INTS12 affected transcription elongation by performing CUT&Tag on Ser2 phosphorylation of the C-terminal tail of RNAPII along the length of the provirus (new data added in Figure 5C) Please see detailed responses below.

      Reviewer #2 (Public review):

      Summary:

      Identifying an important role for the Integrator complex in repressing HIV transcription and suggesting that by targeting subunits of this complex specifically, INTS12, reversal of latency with and without latency reversal agents can be enhanced.

      Strengths:

      The strengths of the paper include the general strategy for screening targets that may activate HIV latency and the rigor of exploring the mechanism of INTS12 repression of HIV transcriptional elongation. I found the mechanism of INTS12 interesting and maybe even the most impactful part of the findings.

      Weaknesses:

      I have two minor comments:

      There was an opportunity to examine a larger panel of latency reversal agents that reactivate by different mechanisms to determine whether INTS12 and transcriptional elongation are limiting for a broad spectrum of latency reversal agents.

      I felt the authors could have extended their discussion of how exquisitely sensitive HIV transcription is to pausing and transcriptional elongation and the insights this provides about general HIV transcriptional regulation.

      We have now added data on latency reversal agents of different mechanisms of action. We show that INTS12 affects HIV latency reversal from agents that affect the non-canonical NF-kB pathway (AZD5582), the canonical NF-kB pathway (TNF-alpha), activation via the T-cell receptor (CD3/CD28 antibodies), through bromodomain inhibition (I-BET151), and through a histone deacetylase inhibitor (SAHA). This additional data has been added to the manuscript in Figure 7, panels B and C as well as adding text to the discussion.

      We appreciate the suggestion to extend the discussion to emphasize how important pausing and elongation are to HIV transcription. Additionally, to further support our claim that INTS12KO with AZD5582 & I-BET151 leads to an increase in elongation, that we previously showed with CUT&Tag data showing an increase in total RNAPII seen in within HIV (Figure 5B), we measured RNAPII Ser2 phosphorylation (Figure 5C) and RNAPII Ser5 phosphorylation (Figure 5—figure supplement 2) and added these findings to the manuscript. Upon measuring Ser2 phosphorylation, a marker associated with elongation, we observed evidence of elongation-competent RNAPII in our AZD5582 & I-BET151 condition as well as our INTS12 KO with AZD5582 & I-BET151 condition, as we saw an increase of Ser2 phosphorylation within HIV. Despite seeing elongation-competent RNAPII in both conditions, we only saw a dramatic increase in total RNAPII for our INTS12 KO and AZD5582 & I-BET151 condition (Figure 5B), which supports that there are more elongation events and that an elongation block is overcome specifically with INTS12 KO paired with AZD5582 & I-BET151. This claim is further supported by our data showing an increase in virus in the supernatant only with the INTS12 KO with AZD5582 & I-BET151 condition in cells from PLWH (Figure 6C). We did not observe any statistically significant differences between RNAPII Ser5 phosphorylation, which might be expected as this mark is not associated with elongation (Figure 5—figure supplement 2).

      Reviewer #3 (Public review):

      Summary:

      Transcriptionally silent HIV-1 genomes integrated into the host`s genome represent the main obstacle to an HIV-1 cure. Therefore, agents aimed at promoting HIV transcription, the so-called latency reactivating agents (LRAs) might represent useful tools to render these hidden proviruses visible to the immune system. The authors successfully identified, through multiple techniques, INTS12, a component of the Integrator complex involved in 3' processing of small nuclear RNAs U1 and U2, as a factor promoting HIV-1 latency and hindering elongation of the HIV RNA transcripts. This factor synergizes with a previously identified combination of LRAs, one of which, AZD5582, has been validated in the macaque model for HIV persistence during therapy (https://pubmed.ncbi.nlm.nih.gov/37783968/). The other compound, I-BET151, is known to synergize with AZD5582, and is a inhibitor of BET, factors counteracting the elongation of RNA transcripts.

      Strengths:

      The findings were confirmed through multiple screens and multiple techniques. The authors successfully mapped the identified HIV silencing factor at the HIV promoter.

      Weaknesses:

      (1) Initial bias:

      In the choice of the genes comprised in the library, the authors readdress their previous paper (Hsieh et al.) where it is stated: "To specifically investigate host epigenetic regulators involved in the maintenance of HIV-1 latency, we generated a custom human epigenome specific sgRNA CRISPR library (HuEpi). This library contains sgRNAs targeting epigenome factors such as histones, histone binders (e.g., histone readers and chaperones), histone modifiers (e.g., histone writers and erasers), and general chromatin associated factors (e.g., RNA and DNA modifiers) (Fig 1B and 1C)".

      From these figure panels, it clearly appears that the genes chosen are all belonging to the indicated pathways. While I have nothing to object to on the pertinence to HIV latency of the pathways selected, the authors should spend some words on the criteria followed to select these pathways. Other pathways involving epigenetic modifications and containing genes not represented in the indicated pathways may have been left apart.

      (2) Dereplication:

      From Figure 1 it appears that INTS12 alone reactivates HIV -1 from latency alone without any drug intervention as shown by the MACGeCk score of DMSO-alone controls. If INTS12 knockdown alone shows antilatency effects, why, then were they unable to identify it in their previous article (Hsieh et al., 2023)? The authors should include some words on the comparison of the results using DMSO alone with those of the previous screen that they conducted.

      (3) Translational potential:

      In order to propose a protein as a drug target, it is necessary to adhere to the "primum non nocere" principle in medicine. It is therefore fundamental to show the effects of INTS12 knockdown on cell viability/proliferation (and, advisably, T-cell activation). These data are not reported in the manuscript in its current form, and the authors are strongly encouraged to provide them.

      Finally, as many readers may not be very familiar with the general principles behind CRISPR Cas9 screening techniques, I suggest addressing them in this excellent review: https://pmc.ncbi.nlm.nih.gov/articles/PMC7479249/.

      (1) The CRISPR library used was more completely described in a previous publication (Hsieh et al, PLOS Pathogens, 2023). However, we now more explicitly refer the reader to information about the pathways targeted in the library. We also point out how initial hits in the library lead to finding genes outside of the starting library as in the follow-up screen in Figure 7 where each of the members of the INT complex are interrogated even though only INTS12 was the only member in the initial library.

      (2) We understand the confusion between the hits in this paper and a previous publication. Indeed, INTS12 was observed in Hsieh et al., PLOS Pathogens, 2023 as a hit in the Venn diagram of Figure 3B of that paper, and in Figure 5A, right panel of that paper. However, it was not followed up on in the previous paper since that paper focused on a hit that was unique to increasing the potency of one particular LRA. We added text to the present manuscript to make it clear that the screens identified many of the same hits. We have also added additional data here on hit validation to underscore the reliability of the CRISPR screen. In one of the cell lines (5A8), EZH2 was a strong hit (Figure 1B). We have now added data that shows that an inhibitor to EZH2 augments the latency reversal of AZD5582/I-BET151 as predicted from the screen. This data has been added to Figure 1, figure supplement 1.

      (3) We appreciate the concern that for INTS12 to be a drug target, it should not be essential to cell viability. We now show that knockout of INTS12 has no effect on cell proliferation (new data added in Figure 2—figure supplement 3). In addition, the discussion now adds additional literature references that describe how knockout of INTS12 has relatively minor effects on cell functions in comparison to knockout of other INT members which supports that the proposal that modulation of INTS12 may be more specific than targeting the catalytic modules of Integrator. Nonetheless, we completely agree with the reviewer that many other aspects of how INTS12 affects T cell functions have not been addressed as well as other potential detrimental effect of INTS12 as a drug target in vivo. We now more explicitly describe these caveats in the discussion but feel that the present manuscript is a first step with a long path ahead before the translational potential might be realized.

      (4) We now cite the review of CRISPR screens suggested by the reviewer.

      Responses to recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors report in the legend of Figure 2 (and similarly in other figures) that there was "a calculated INTS12 knockout score of 76% (for the one guide used) and 69% (for one of three guides used), respectively." However, it would be helpful to show representative data on the efficiency of INTS12 knockdown in cell lines and primary cells, as well as data on the efficiency of the complementation (Figure 2C).

      The knockout scores cited are the genetic assays for the efficiency based on sequence files. As the knockouts are done with multiple guides the knockout for each guide is an underestimate of the total knockout. The complementation, however, was done by adding back INTS12 in a lentiviral vector that also contains a drug resistance marker (puromycin). Cells were then selected for puromycin resistance, and therefore, all of them contain the complemented gene. What one would ideally like is a Western blot to quantify the amount of INTS12 remaining in the knockout pools. Unfortunately, despite obtaining multiple different commercial sources of INTS12 antibodies, we were unable to identify one that was suitable for Western blotting (as opposed to two that did work for CUT&Tag). Nonetheless, the functional data in primary T cells from PLWH and in J-Lat cells lines does show the even if the knockout is suboptimal, we find activation after INTS12 knockout (e.g., Figure 6).

      (2) Flow cytometry methods are not reported, but was a viability dye included when testing GFP reactivation (Figure S2)? More broadly, showing data on the viability of cells post-knockdown and drug treatments would help, as cell mortality is inherently associated with latency reactivation in J-Lat cells. For the same reason, reporting viability data would be important for primary cells, as the electroporation procedure can lead to significant mortality.

      We did not include viability dyes in the data for GFP activation. However, as described in the public response, we have done growth curves in J-Lat 10.6 cells with and without INTS12 knockout and find no effects on cell proliferation (Figure 2—figure supplement 3). As the reviewer points out, it is not possible to do these experiments in primary cells since the electroporation itself causes a degree of cell death. Nonetheless, we do see effects on HIV activation in these primary cells (Figure 6).

      (3) Figure S2 shows a relatively high baseline expression (approximately 15%) of HIV-GFP, which is not unusual for the J-Lat 10.6 clone. However, Figure 3 appears to show no HIV RNA reads in the control condition of this same cell clone. How do the authors reconcile this discrepancy?

      We believe that the discrepancies in the flow cytometry versus RNA-seq assays are due to differences in the sensitivity of the assays, the linear range of the assays especially at the lower end, and the different half-lives of RNA versus protein. We now clarify that Figure 3 does not show “no” HIV RNA at baseline, but rather values of ~30 copies per million read counts. This increases to ~800 copies per million read counts when INTS12 knockout cells are treated with AZD5582/I-BET151. These values have the same fold change predicted in Figure 4, and more closely resemble the trend in Figure 2—figure supplement 1.

      (4) The combination of AZD5582 and I-BET151 consistently reactivates HIV latency (including GFP protein expression), as previously reported and as shown here by the authors. However, in Figure 5B, RPB3/RNAPII occupancy in the DMSO control appears higher than in the AAVS1KO + AZD5582 and I-BET151 samples. This should be discussed, as it could raise concerns about the robustness of RPB3/RNAPII occupancy results as a proxy for provirus elongation.

      As addressed in the public comments, in order to strengthen our claims about transcriptional elongation control, we measured RNAPII Ser2 and Ser5 phosphorylation levels. We see evidence of elongation with Ser2 in the condition of concern (AAVS1 KO + AZD5582 & I-BET151) as well as our main condition of interest (INTS12 KO + AZD5582 & I-BET151) and no change in Ser5 for any condition. With both the Ser2 phosphorylation and total RNAPII as well as our virus release and transcription data we believe that we are seeing evidence of increased elongation with INTS12 KO with AZD5582 & I-BET151. One potential nuance that may not be gathered from the CUT&Tag data is the turnover rate of the polymerase. Despite the levels of RNAPII appearing lower in the condition of concern (AAVS1 KO + AZD5582 & I-BET151) compared to DMSO it is possible that low levels of elongation are occurring but that in our INTS12 KO + AZD5582 & I-BET151 condition there is more rapid elongation and this is why we can observe more RNAPII within HIV. This new data is added in Figure 5C and Figure 5—supplement 2 and its implications are now described in more detail in the discussion.

      (5) The authors write that "Degree of reactivation was correlated with reservoir size as donors PH504 (star symbol) and PH543 (upside down triangle) have the largest HIV reservoirs (supplemental Figure S2)." I could not find mention of the reservoir size of these donors in the figure provided.

      This confusion was caused by mislabeling of the supplement number, which we fixed, and we added additional labeling to make finding the reservoir size even more clear as this is an important part of the manuscript. This is now found in Supplemental file S4.

      Reviewer #3 (Recommendations for the authors):

      (1) The MAGeCK gene score is a feature that is essential for the interpretation of the results in Figure 1. The authors do quote the Li et al. paper where this score was described for the first time (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4), however, they may understand that not all readers may be familiar with this score. Therefore a didactic short description of this score should be done when introducing the results in Figure 1.

      We have added a short description to the paper to address this.

      (2) Figure 4. The authors write: "Among the host genes most prominently affected by INTS12 knockout with AZD5582 & I-BET151 are MAFA, MAFB, and ID2 (full list of genes in supplemental file S3)." I am a bit confused. In the linked Excel file there is only a list of a few genes. The differentially expressed genes appear to be many more from Figure 4. The full list should be uploaded.

      We believe there was a mistake in our original uploading and naming of the supplements. We have now double-checked numbering on the supplements and added in text clarification of which excel tabs hold the desired information.

      (3) Figure 6: The authors are right in highlighting that there is a high level of variability in viral RNA in supernatants in the early stages of viral reactivation. It is therefore advisable to repeat measurements at Day 7, at which variability decreases and data are more reliable (please, see: https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(23)00443-7/fulltext).

      While it would have been nice to prolong these measurements, our current assay conditions are not optimal for longer term growth of the cells. We note that the measurements were all done in biological triplicates (independent knockouts) and in different individuals. Because the number of activatable latent proviruses is variable and the number of cells tested is limiting, the variability in the assays is expected.

      (4) Figure 7: The main genes outside the INTS family should be identified, also.

      We include the full list in supplemental file S5 and sort by most enriched.

      (5) Methods: A statistical paragraph should be added in the Methods section, detailing the data analysis procedures and the key parameters utilized (for example, which is the MAGeCK gene score threshold that they used to consider knockdown efficacy on HIV latency?).

      There is no MAGeCK score threshold that we use to determine efficacy on HIV latency. In a previous publication using CRISPR screens for HIV Dependency Factors (Montoya et al, mBio 2023), we showed that there is a relationship between the MAGeCK and the effect of that gene knockout on HIV replication (Figure 5 that paper). However, it is a continuum rather than a strict threshold and we believe that the effects on HIV latency would respond similarly. In the current paper, we have focused on the top hits rather than a comprehensive analysis of all the entire list. In case the reviewer is referring to the average and standard deviation of the non-targeting controls, we have added this to the figure legend and methods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary:  

      The study identifies two types of activation: one that is cue-triggered and nonspecific to motion directions, and another that is specific to the exposed motion directions but occurs in a reversed manner. The finding that activity in the medial temporal lobe (MTL) preceded that in the visual cortex suggests that the visual cortex may serve as a platform for the manifestation of replay events, which potentially enhance visual sequence learning.  

      Strengths: 

      Identifying the two types of activation after exposure to a sequence of motion directions is very interesting. The experimental design, procedures, and analyses are solid. The findings are interesting and novel. 

      Weaknesses: 

      It was not immediately clear to me why the second type of activation was suggested to occur spontaneously. The procedural differences in the analyses that distinguished between the two types of activation need to be a little better clarified.  

      We thank the reviewer for his/her summary and constructive feedback on our study. We appreciate the recognition of the strengths of our study.

      The second type of activation, namely the replay of feature-specific reactivations, is considered spontaneous because it reflects internally driven neural processes rather than responses directly triggered by external stimuli. Unlike responses evoked by stimuli, spontaneous replay is not time-locked to stimulus onset. Instead, it arises from the brain's intrinsic activity, typically observed during offline periods (e.g., rest or blank period) when external stimuli are absent. This allows the neural system to reactivate and consolidate prior experiences without interference from ongoing external stimuli.

      Replay is believed to be a key mechanism underlying various cognitive functions, such as memory consolidation (Gillespie et al., 2021; Gridchyn et al., 2020), learning (Igata et al., 2021), prediction and planning (Ólafsdóttir et al., 2018). Furthermore, the hippocampus and related cortical areas engage in replay to extract abstract relationships from sequential experiences, forming a "template" that can generalize across contexts (Liu et al., 2019). In our study, the feature-specific replay observed during blank periods likely reflects this process, supporting the integration of exposed motion direction sequences into cohesive memory representations and facilitating visual sequence learning.

      We have extended the Discussion section to incorporate this explanation (Lines 440 - 447).

      Regarding the second question, the procedural differences between the two types of activations lie in the classifiers used for the two analyses: a multiclass classifier for non-specific elevated responses and binary classifiers for feature-specific replay. 

      For the non-feature-specific elevated responses, we trained a five-class (with the labels of the four RDKs and the ITI (inter-stimulus interval)) classifier on the localizer data and tested on the blank period in the main phase. We attempted to decode motion direction information at each time point at the group level. However, the results revealed no feature-specific information at the group level during the blank period.

      For the feature-specific replay, we employed the temporal delayed linear modeling (TDLM) to examine whether individual motion direction information was encoded in a sequential and spontaneous manner. Here, we first needed to train four binary classifiers, each was sensitive to only one motion direction (i.e., 0°, 90°, 180°, or 270°), as our aim was to quantify the evidence of feature-specific sequence in the subsequent analyses. For each classifier, positive instances were trials where the corresponding feature (e.g., 0°) was presented, while negative instances included trials with other features (e.g., 90°, 180°, and 270°) and an equivalent amount of null data from the ITI period (1–1.5 s).

      We have clarified these methodological details in the Methods section (Pages 34 – 41).

      Reviewer #2 (Public review): 

      This paper shows and analyzes an interesting phenomenon. It shows that when people are exposed to sequences of moving dots (that is moving dots in one direction, followed by another direction, etc.), showing either the starting movement direction or ending movement direction causes a coarse-grained brain response that is similar to that elicited by the complete sequence of 4 directions. However, they show by decoding the sensor responses that this brain activity actually does not carry information about the actual sequence and the motion directions, at least not on the time scale of the initial sequence. They also show a reverse reply on a highly compressed time scale, which is elicited during the period of elevated activity, and activated by the first and last elements of the sequence, but not others. Additionally, these replays seem to occur during periods of cortical ripples, similar to what is found in animal studies. 

      These results are intriguing. They are based on MEG recordings in humans, and finding such replays in humans is novel. Also, this is based on what seems to be sophisticated statistical analysis. However, this is the main problem with this paper. The statistical analysis is not explained well at all, and therefore its validity is hard to evaluate. I am not at all saying it is incorrect; what I am saying is that given how it is explained, it cannot be evaluated. 

      We thank the reviewer’s detailed evaluation as well as the acknowledgment of the novelty of our study.

      To address the concern about the statistical analysis, in the revised manuscript, we have modified the Methods section to provide a more detailed explanation of the analytical pipeline, particularly for several important aspects such as decoding probability and TDLM. (Lines 646 – 657, Lines 682 – 734). 

      Below, we provide point-by-point responses to further elaborate on these revisions and address the reviewer’s comments.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors): 

      I have questions.  

      (1) Participants were exposed to a predefined sequence of motion directions either clockwise or counterclockwise. Is it possible that the observed replay is related to the activation of MST neurons? If a predetermined sequence is not in either clockwise or counterclockwise but is randomly determined like 0{degree sign}->180{degree sign}->270{degree sign}->90{degree sign}, would the same result be obtained?  

      We thank the reviewer for these thoughtful questions.

      First, regarding the potential involvement of MST neurons, it is plausible that the observed replay might involve activity in motion-sensitive brain regions, including the medial superior temporal (MST) and even middle temporal (MT) areas. MST neurons, located in the extrastriate visual cortex, are highly direction-selective and are known for their sensitivity to complex motion patterns, such as rotations and expansions (Duffy & Wurtz, 1991; Saito et al., 1986). In our experiment, the use of RDKs with four distinct motion directions might elicit responses in MST neurons. However, due to the limited spatial resolution of MEG, we cannot provide direct evidence for this claim. 

      Second, regarding the impact of randomly ordered sequences, we believe that the replay patterns would still occur even if the sequences were randomly ordered (e.g., 0° → 180° → 270° → 90°). After a sequence is repeatedly exposed, the hippocampus has the capacity to encode abstract relationships in the sequence. Evidence supporting this view comes from previous studies. For example, Liu et al., (2019) showed that replay does not merely recapitulate visual experience but can also follow a sequence implied by learned abstract knowledge. In their study, participants were instructed that viewing pictures C→D, B→C, and A→B implies a true sequence of A→B→C→D. During subsequent testing, they observed replay events following this learned true sequence, even with novel visual stimuli, indicating that the brain maintains sequence knowledge independent of specific stimuli. Similarly, Ekman et al., (2023) showed that prediction-based neural responses could be observed when moving dots were presented in a random order rather than in a clockwise or counterclockwise order, which correspond to the four motion directions in our study. 

      Together, these studies suggest that replay mechanisms in the brain are flexible and can encode and reproduce abstract relationships between sequential stimuli, regardless of their specific spatial contents. Therefore, we believe that even if the sequence were randomly ordered, the same backward replay pattern would still be observed.

      (2) Is it possible that the motion direction non-specific responses actually reflect the replay of another feature of the exposed sequence, namely, the temporally rhythmic presentations of the sequence, rather than suggested in the discussion?  

      We thank the reviewer for raising this insightful possibility.

      There is substantial evidence that rhythmic stimulation can entrain neural oscillations, which in turn facilitates predictions about future inputs and enhances the brain's readiness for incoming stimuli (Barne et al., 2022; Herrmann et al., 2016; Lakatos et al., 2008, 2013). In our study, the temporally rhythmic presentation of the motion sequence may have entrained oscillatory activity in the brain, leading to periodic activation of sensory cortices. This rhythmic entrainment could account for the observed nonspecific responses by reflecting the brain's temporal predictions rather than specific feature replay. 

      It is important to note that, however, this interpretation is in line with our initial explanation that the non-feature-specific elevated responses likely reflect a general facilitation of neural processes for any upcoming stimuli, rather than being tied to specific stimuli. The rhythmic entrainment mechanism provides another way to understand how the temporal structure in the sequences might contribute to the non-feature-specific elevated responses.

      We have revised the Discussion section to incorporate this interpretation, providing a more comprehensive account for the non-feature-specific elevated responses (Lines 428 – 439).

      Reviewer #2 (Recommendations for the authors): 

      The main problem with the paper is that the sophisticated statistical methodology is not explained well and therefore its validity is hard to evaluate. I am not at all saying it is incorrect, what I am saying is that given how it is explained, it cannot be evaluated.  

      See below for detailed point-by-point responses.  

      The first part is clear. There are 4 directions of motion, and there can also be a blank screen. The random decoding accuracy would be 20%. The decoding methods from the sensors yielded a little above 50% accuracy. This is clearly about chance, but much less than one would get from electrode recording of motion-selective cells in the cortex. However, the concept and methods used here seem clear, in contrast to what comes next.  

      Indeed, in the first step, we aimed to validate the reliability of our decoding model by applying a leave-one-out cross validation scheme to the localizer data. Our results showed that the decoding accuracy exceeded 50%, demonstrating robust decoding performance. However, due to the noninvasive nature of MEG and its low spatial resolution, the recorded signals represent population-level activity that inherently includes more noise compared to electrode recordings of motion-selective neurons. Therefore, the decoding accuracy in our study is understandably lower than that obtained with electrode recordings.

      Next, and most of the paper relies on this concept, they use the term decoding probability (Figure 2). What is the decoding probability measure (Turner 2023)? This is not explained in the methods section. I scanned the Turner et al 2023 paper referenced and could not find the term decoding probability there. In short, I have no idea what this means. What are these numbers between 0-0.3? How does this relate to accuracies above 50% reported? This is an important concept here, and it is used throughout the paper, so it makes it hard to evaluate the paper.  

      We apologize for the lack of clarity in our explanation of the term "decoding probability." Specifically, we used a one-versus-rest Lasso logistic regression model trained on the localizer data to decode the MEG signal patterns elicited by each motion direction during the main phase. The trained model could be used to predict a single label at each time point for each trial (e.g., labels 1 – 4 correspond to the four motion directions and label 5 corresponds to the ITI period). By comparing the predicted label with the true label across test trials, we could compute the time-resolved decoding accuracy as final reports.

      Alternatively, rather than predicting a single label for each time point and each trial, the model can also output the probabilities associated with each label/class (e.g., we used the predict_proba function in scikit-learn). This results in a 5-column output, where each column represents the probability of the corresponding class, and the sum of the probabilities across the five columns equals 1. Finally, at each time point, averaging these probabilities across trials yields five values that indicate the likelihood of the predicted stimulus belonging to each class.

      For example, Figure 2 in the manuscript depicts the decoding probabilities for the four RDKs (the probabilities for the ITI class are not shown in the figure). The number in a cell (between 0 and 0.3) indicates the probability of each class at a given time point (Figure 2A). The decoding probability does not have a direct relationship with the decoding accuracy. However, since there are five classes, the chance level of the decoding probability is 0.2. The highest probability among the five classes at a given time point determines the decoded label when computing the decoding accuracy.

      For illustration, in the left panel of Figure 2B, at the onset of the first RDK (0 s), the mean decoding probabilities for the classes 0°, 90°, 180°, 270°, and the blank ITI are 5%, 4.1%, 4.0%, 4.5%, and 82.4%, respectively. Thus, the decoded label should be the blank ITI. In contrast, 0.4 s after the onset of the first RDK, the mean decoding probabilities for the five classes are 28.0%, 19.0%, 22.8%, 21.2%, and 9.0%, respectively. Therefore, the decoded label should be 0°.

      We have revised the Methods section to explain this issue (Lines 646 – 657).

      They did find compressed reversed reply events (Figures 3-4). This is again confusing for several reasons. First, because they use the same unexplained decoding probability measure. Second, the optimal time point defined above depends on the start time of a stimulus, but here the start time is random. Third, the TDLM algorithm is hard to understand. For example, what are the reactivation probabilities of Figure 3C? They do make an effort to explain this in the methods section (lines 652-697) but it's not clear enough from the outset. For example, what does the state X_j is this a vector of activity of sensors? Are these decoding probabilities of the different directions? What is it? Also, what is X_i vs X_i(\Delta t)? Frankly, despite their efforts, I am very confused. Additionally, the figures use the term reactivation probability, where is it defined? So again, the results seem interesting, but the methods are not explained well at all.  

      This paper must better explain the statistical methods so that they can be evaluated. This is not easy, these are relatively complex methods, but they must be explained much better so the validity of the paper can be examined.  

      Regarding the optimal time point, we defined it as the time point with the highest decoding accuracy, determined during the validation of the localizer data using a leave-one-out cross-validation scheme. This optimal time point was participant- and motion-direction-specific, as the latency to achieve the peak decoding accuracy varied across individuals and motion directions. For group-level visualization, we circularly shifted the data over time, aligning each optimal time point to a common reference point (arbitrarily set at 200 ms after stimulus onset). Importantly, however, these time points are unrelated to the data in the main phase, as the models were trained using the independent localizer data and then applied to each time point during the blank period in the main phase.

      Regarding the TDLM algorithm, detailed descriptions of the algorithm have been provided in the revised Methods section (Line 683 – 735). Furthermore, we have included explanatory notes in the main text and figure legend to provide immediate context for terms such as "reactivation probability" (Lines 247 – 248, Lines 275 – 276).

      This paper uses MEG in humans, a non-invasive technique. This allows for such results in humans. Indeed (if the methods are correct) these units can be decoded to provide statistically significant estimates of motion direction. Note, however, that the spatial resolution of MEG is limited. The decoding accuracies of above 50% are way above chance. Note however that if actual motion-sensitive neurons (e.g. area MT) were recorded, and even if the motion is far from 100% coherence, the decoding accuracy would approach 100%. 

      We agree with the reviewer that decoding accuracy would approach 100% if single-neuron data from motion-sensitive areas (e.g., area MT) were recorded, given the exceptionally high signal-to-noise ratio (SNR) of such data. However, two considerations inform the methodology of our study.

      First, while single-neuron recordings provide invaluable insights, acquiring such data in humans is both ethically challenging and logistically impractical.

      Non-invasive MEG, by contrast, offers a practical alternative that can achieve robust decoding of population-level activity with a reasonable SNR.

      Second, the primary goal of our study was not merely to achieve high decoding accuracy but also to examine the replay of an exposed motion sequence in the human visual cortex. To achieve this, we first needed to train feature-specific models that can be used to decode the spontaneous reactivations of the four motion directions during the blank period. The ability to distinguish representations of the four motion directions was essential for calculating the “sequenceness” of the exposed motion sequence in the TDLM algorithm. While the absolute decoding accuracy of MEG data may not match that of single-neuron data, an important outcome was the successful construction of feature-specific models for the four motion directions (Figure 3B in the manuscript). These models provided a robust foundation for investigating sequential replay in the brain. These results also align with the broader goal of leveraging MEG data to study dynamic neural processes in humans, even in the face of its spatial resolution limitation.

      Minor:  

      (1) Line 246 - there is no figure S2A, subplots are not labeled.  

      We have corrected this in the revised manuscript.

      (2) Is Figure 3B referred to in the text? Same for 3C. This figure is there for explaining the statistical models used, but it is not well utilized.

      We have modified the text to clarify this issue in the revised manuscript.

      (3) English:  

      There are problems with the use of English in the paper, this should be corrected in the next version. A few examples are below.  

      Noises -> noise  

      - "along the motion path in visual cortex" What does this sentence mean? Is this referring to motion-sensitive areas in the brain? Please clarify.  

      There are many other examples. This is minor, but should be corrected.

      We have corrected these errors in the revised manuscript.

      References

      Barne, L. C., Cravo, A. M., de Lange, F. P., & Spaak, E. (2022). Temporal prediction elicits rhythmic preactivation of relevant sensory cortices. European Journal of Neuroscience, 55(11–12), 3324–3339. https://doi.org/10.1111/ejn.15405

      Ekman, M., Kusch, S., & de Lange, F. P. (2023). Successor-like representation guides the prediction of future events in human visual cortex and hippocampus. eLife, 12, e78904. https://doi.org/10.7554/eLife.78904

      Gillespie, A. K., Maya, D. A. A., Denovellis, E. L., Liu, D. F., Kastner, D. B., Coulter, M. E., Roumis, D. K., Eden, U. T., & Frank, L. M. (2021). Hippocampal replay reflects specific past experiences rather than a plan for subsequent choice. Neuron, 109(19), 3149-3163.e6. https://doi.org/10.1016/j.neuron.2021.07.029

      Gridchyn, I., Schoenenberger, P., O’Neill, J., & Csicsvari, J. (2020). AssemblySpecific Disruption of Hippocampal Replay Leads to Selective Memory Deficit. Neuron, 106(2), 291-300.e6. https://doi.org/10.1016/j.neuron.2020.01.021

      Herrmann, B., Henry, M. J., Haegens, S., & Obleser, J. (2016). Temporal expectations and neural amplitude fluctuations in auditory cortex interactively influence perception. NeuroImage, 124, 487–497. https://doi.org/10.1016/j.neuroimage.2015.09.019

      Igata, H., Ikegaya, Y., & Sasaki, T. (2021). Prioritized experience replays on a hippocampal predictive map for learning. Proceedings of the National Academy of Sciences, 118(1), e2011266118. https://doi.org/10.1073/pnas.2011266118

      Lakatos, P., Karmos, G., Mehta, A. D., Ulbert, I., & Schroeder, C. E. (2008). Entrainment of Neuronal Oscillations as a Mechanism of Attentional Selection. Science, 320(5872), 110–113. https://doi.org/10.1126/science.1154735

      Lakatos, P., Musacchia, G., O’Connel, M. N., Falchier, A. Y., Javitt, D. C., & Schroeder, C. E. (2013). The Spectrotemporal Filter Mechanism of Auditory Selective Attention. Neuron, 77(4), 750–761. https://doi.org/10.1016/j.neuron.2012.11.034

      Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human Replay Spontaneously Reorganizes Experience. Cell, 178(3), 640-652.e14. https://doi.org/10.1016/j.cell.2019.06.012

      Ólafsdóttir, H. F., Bush, D., & Barry, C. (2018). The Role of Hippocampal Replay in Memory and Planning. Current Biology, 28(1), R37–R50. https://doi.org/10.1016/j.cub.2017.10.073

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Overall I found the approach taken by the authors to be clear and convincing. It is striking that the conclusions are similar to those obtained in a recent study using a different computational approach (finite state controllers), and lends confidence to the conclusions about the existence of an optimal memory duration. There are a few questions that could be expanded on in future studies:

      (1) Spatial encoding requirements

      The manuscript contrasts the approach taken here (reinforcement learning in a gridworld) with strategies that involve a "spatial map" such as infotaxis. However, the gridworld navigation algorithm has an implicit allocentric representation, since movement can be in one of four allocentric directions (up, down, left, right), and wind direction is defined in these coordinates. Future studies might ask if an agent can learn the strategy without a known wind direction if it can only go left/right/forward/back/turn (in egocentric coordinates). In discussing possible algorithms, and the features of this one, it might be helpful to distinguish (1) those that rely only on egocentric computations (run and tumble), (2) those that rely on a single direction cue such as wind direction, (3) those that rely on allocentric representations of direction, and (4) those that rely on a full spatial map of the environment.

      We agree that the question of what orientation skills are needed to implement an algorithm is interesting. We remark that our agents do not use allocentric directions in the sense of north, east, west and east relative to e.g. fixed landmarks in the environment. Instead, directions are defined relative to the mean wind, which is assumed fixed and known. (In our first answer to reviewers we used “north east south west relative to mean wind”, which may have caused confusion – but in the manuscript we only use upwind downwind and crosswind).

      (2) Recovery strategy on losing the plume

      The authors explore several recovery strategies upon losing the plume, including backtracking, circling, and learned strategies, finding that a learned strategy is optimal. As insects show a variety of recovery strategies that can depend on the model of locomotion, it would be interesting in the future to explore under which conditions various recovery strategies are optimal and whether they can predict the strategies of real animals in different environments.

      Agreed, it will be interesting to study systematically the emergence of distinct recovery strategies and compare to living organisms.

      (3) Is there a minimal representation of odor for efficient navigation?

      The authors suggest that the number of olfactory states could potentially be reduced to reduce computational cost. They show that reducing the number of olfactory states to 1 dramatically reduces performance. In the future it would be interesting to identify optimal internal representations of odor for navigation and to compare these to those found in real olfactory systems. Does the optimal number of odor and void states depend on the spatial structure of the turbulence as explored in Figure 5?

      We agree that minimal odor representations are an intriguing question. While tabular Q learning cannot derive optimal odor representations systematically, one could expand on the approach we have taken here and provide more comparisons. It will be interesting to follow this approach in a future study.

      Reviewer #2 (Public review):

      Summary:

      The authors investigate the problem of olfactory search in turbulent environments using artificial agents trained using tabular Q-learning, a simple and interpretable reinforcement learning (RL) algorithm. The agents are trained solely on odor stimuli, without access to spatial information or prior knowledge about the odor plume's shape. This approach makes the emergent control strategy more biologically plausible for animals navigating exclusively using olfactory signals. The learned strategies show parallels to observed animal behaviors, such as upwind surging and crosswind casting. The approach generalizes well to different environments and effectively handles the intermittency of turbulent odors.

      Strengths:

      * The use of numerical simulations to generate realistic turbulent fluid dynamics sets this paper apart from studies that rely on idealized or static plumes.

      * A key innovation is the introduction of a small set of interpretable olfactory states based on moving averages of odor intensity and sparsity, coupled with an adaptive temporal memory.

      * The paper provides a thorough analysis of different recovery strategies when an agent loses the odor trail, offering insights into the trade-offs between various approaches.

      * The authors provide a comprehensive performance analysis of their algorithm across a range of environments and recovery strategies, demonstrating the versatility of the approach.

      * Finally, the authors list an interesting set of real-world experiments based on their findings, that might invite interest from experimentalists across multiple species.

      Weaknesses:

      * Using tabular Q-learning is both a strength and a limitation. It's simple and interpretable, making it easier to analyze the learned strategies, but the discrete action space seems somewhat unnatural. In real-world biological systems, actions (like movement) are continuous rather than discrete. Additionally, the ground-frame actions may not map naturally to how animals navigate odor plumes (e.g. insects often navigate based on their own egocentric frame).

      We agree with the reviewer, and will look forward to study this problem further to make it suitable for meaningful comparisons with animal behavior.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have addressed my major concerns and I support publication of this interesting manuscript. A couple of small suggestions:

      (1) In discussing performance in different environments (line 328-362) it might be easier to read if you referred to the environments by descriptive names rather than numbers.

      Thank you for the suggestion, which we implemented

      (2) Line 371: measurements of flow speed depend on antennae in insects. Insects can measure local speed and direct of flow using antennae, e.g. Bell and Kramer, 1979, Suver et al. 2019. Okubo et al. 2020,

      Thank you for the references

      (3) line 448: "Similarly, an odor detection elicits upwind surges that can last several seconds" maybe "Similarly, an odor detection elicits upwind surges that can outlast the odor by several seconds"?

      Thank you for the suggestion

      Reviewer #2 (Recommendations for the authors):

      I commend the authors for their revisions in response to reviewer feedback.

      While I appreciate that the manuscript is now accompanied by code and data, I must note that the accompanying code-repository lacks proper instructions for use and is likely incomplete (e.g. where is the main function one should run to run your simulations? How should one train? How should one recreate the results? Which data files go where?).

      For examples of high-quality code-release, please see the documentation for these RL-for-neuroscience code repositories (from previously published papers):

      https://github.com/ryzhang1/Inductive_bias

      https://github.com/BruntonUWBio/plumetracknets

      The accompanying data does provide snapshots from their turbulent plume simulations, which should be valuable for future research.

      Thank you for the suggestions for how to improve clarity of the code. The way we designed the repository is to serve both the purpose of developing the code as well as sharing. This is because we are going to build up on this work to proceed further. Nothing is missing in the repository (we know it because it is what we actually use).

      We do plan to create a more user-friendly version of the code, hopefully this will be ready in the next few months, but it wont be immediate as we are aiming to also integrate other aspects of the work we are currently doing in the Lab. The Brunton repository is very well organized, thanks for the pointer.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Overall I found the approach taken by the authors to be clear and convincing. It is striking that the conclusions are similar to those obtained in a recent study using a different computational approach (finite state controllers), and lend confidence to the conclusions about the existence of an optimal memory duration. There are a few points or questions that could be addressed in greater detail in a revision:

      (1) Discussion of spatial encoding

      The manuscript contrasts the approach taken here (reinforcement learning in a grid world) with strategies that involve a "spatial map" such as infotaxis. The authors note that their algorithm contains "no spatial information." However, I wonder if further degrees of spatial encoding might be delineated to better facilitate comparisons with biological navigation algorithms. For example, the gridworld navigation algorithm seems to have an implicit allocentric representation, since movement can be in one of four allocentric directions (up, down, left, right). I assume this is how the agent learns to move upwind in the absence of an explicit wind direction signal. However, not all biological organisms likely have this allocentric representation. Can the agent learn the strategy without wind direction if it can only go left/right/forward/back/turn (in egocentric coordinates)? In discussing possible algorithms, and the features of this one, it might be helpful to distinguish<br /> (1) those that rely only on egocentric computations (run and tumble),<br /> (2) those that rely on a single direction cue such as wind direction,<br /> (3) those that rely on allocentric representations of direction, and<br /> (4) those that rely on a full spatial map of the environment.

      As Referee 1 points out, even if the algorithm does not require a map of space, the agent is still required to tell apart directions relative to the wind direction which is assumed known. Indeed, although in the manuscript we labeled actions allocentrically as “ up down left and right”, the source is always placed in the same location, hence “left” corresponds to upwind; “right” to downwind and “up” and “down” to crosswind right and left. Thus in fact directions are relative to the mean wind, which is therefore assumed known. We have better clarified the spatial encoding required to implement these strategies, and re-labeled the directions as upwind, downwind, crosswind-right and crosswind-left.

      In reality, animals cannot measure the mean flow, but rather the local flow speed e.g. with antennas for insects, with whiskers for rodents and with the lateral line for marine organisms. Further work is needed to address how local flow measures enable navigation using Q learning.

      (2) Recovery strategy on losing the plume

      While the approach to encoding odor dynamics seems highly principled and reaches appealingly intuitive conclusions, the approach to modeling the recovery strategy seems to be more ad hoc. Early in the paper, the recovery strategy is defined to be path integration back to the point at which odor was lost, while later in the paper, the authors explore Brownian motion and a learned recovery based on multiple "void" states. Since the learned strategy works best, why not first consider learned strategies, and explore how lack of odor must be encoded or whether there is an optimal division of void states that leads to the best recovery strategies? Also, although the authors state that the learned recovery strategies resemble casting, only minimal data are shown to support this. A deeper statistical analysis of the learned recovery strategies would facilitate comparison to those observed in biology.

      We thank Referee 1 for their remarks and suggestion to give the learned recovery a more prominent role and better characterize it. We agree that what is done in the void state is definitely key to turbulent navigation. In the revised manuscript, we have further substantiated the statistics of the learned recovery by repeating training 20 times and comparing the trajectories in the void (Figure 3 figure supplement 3, new Table 1). We believe however that starting with the heuristic recovery is clearer because it allows to introduce the concept of recovery more clearly. Indeed, the learned “recovery” is so flexible that it ends up mixing recovery (crosswind motion) to aspects of exploitation (surge): we defer a more in-depth analysis that disentangles these two aspects elsewhere. Also, we added a whole new comparison with other biologically inspired recoveries both in the native environment and for generalization (Figure 3 and 5).

      (3) Is there a minimal representation of odor for efficient navigation?

      The authors suggest (line 280) that the number of olfactory states could potentially be reduced to reduce computational cost. This raises the question of whether there is a maximally efficient representation of odors and blanks sufficient for effective navigation. The authors choose to represent odor by 15 states that allow the agent to discriminate different spatial regimes of the stimulus, and later introduce additional void states that allow the agent to learn a recovery strategy. Can the number of states be reduced or does this lead to loss of performance? Does the optimal number of odor and void states depend on the spatial structure of the turbulence as explored in Figure 5?

      We thank the referee for their comment. Q learning defines the olfactory states prior to training and does not allow a systematic optimization of odor representation for the task. We can however compare different definitions of the olfactory states, for example based on the same features but different discretizations. We added a comparison with a drastically reduced number of non-empty olfactory states to just 1, i.e. if the odor is above threshold at any time within the memory, the agent is in the non-void olfactory state, otherwise it is in the void state. This drastic reduction in the number of olfactory states results in less positional information and degrades performance (Figure 5 figure supplement 5).

      The number of void states is already minimal: we chose 50 void states because this matches the time agents typically remain in the void (less than 50 void states results in no convergence and more than 50 introduces states that are rarely visited).

      One may instead resort to deep Q-learning or to recurrent neural networks, which however do not provide answers as for what are the features or olfactory states that drive behavior (see discussion in manuscript and questions below).

      Reviewer #2 (Public review):

      Summary:

      The authors investigate the problem of olfactory search in turbulent environments using artificial agents trained using tabular Q-learning, a simple and interpretable reinforcement learning (RL) algorithm. The agents are trained solely on odor stimuli, without access to spatial information or prior knowledge about the odor plume's shape. This approach makes the emergent control strategy more biologically plausible for animals navigating exclusively using olfactory signals. The learned strategies show parallels to observed animal behaviors, such as upwind surging and crosswind casting. The approach generalizes well to different environments and effectively handles the intermittency of turbulent odors.

      Strengths:

      (1) The use of numerical simulations to generate realistic turbulent fluid dynamics sets this paper apart from studies that rely on idealized or static plumes.

      (2) A key innovation is the introduction of a small set of interpretable olfactory states based on moving averages of odor intensity and sparsity, coupled with an adaptive temporal memory.

      (3) The paper provides a thorough analysis of different recovery strategies when an agent loses the odor trail, offering insights into the trade-offs between various approaches.

      (4) The authors provide a comprehensive performance analysis of their algorithm across a range of environments and recovery strategies, demonstrating the versatility of the approach.

      (5) Finally, the authors list an interesting set of real-world experiments based on their findings, that might invite interest from experimentalists across multiple species.

      Weaknesses:

      (1) The inclusion of Brownian motion as a recovery strategy, seems odd since it doesn't closely match natural animal behavior, where circling (e.g. flies) or zigzagging (ants' "sector search") could have been more realistic.

      We agree that Brownian motion may not be biologically plausible -- we used it as a simple benchmark. We clarified this point, and re-trained our algorithm with adaptive memory using circling and zigzaging (cast and surge) recoveries. The learned recovery outperforms all heuristic recoveries (Figure 3D, metrics G). Circling ranks second, and achieves these good results by further decreasing the probability of failure and paying slightly in speed. When tested in the non-native environments 2 to 6, the learned recovery performs best in environments 2, 5 and 6 i.e. from long range more relevant to flying insects; whereas circling generalizes best in odor rich environments 3 and 4, representative of closer range and close to the substrate (Figure 5B, metrics G). In the new environments, similar to the native environment, circling favors convergence (Figure 5B, metrics f<sup>+</sup>) over speed (Figure 5B, metrics g<sup>+</sup> and τ<sub>min</sub>/τ), which is particularly deleterious at large distance.

      (2) Using tabular Q-learning is both a strength and a limitation. It's simple and interpretable, making it easier to analyze the learned strategies, but the discrete action space seems somewhat unnatural. In real-world biological systems, actions (like movement) are continuous rather than discrete. Additionally, the ground-frame actions may not map naturally to how animals navigate odor plumes (e.g. insects often navigate based on their own egocentric frame).

      We agree with the reviewer that animal locomotion does not look like a series of discrete displacements on a checkerboard. However, to overcome this limitation, one has to first focus on a specific system to define actions in a way that best adheres to a species’ motor controls. Moreover, these actions are likely continuous, which makes reinforcement learning notoriously more complex. While we agree that more realistic models are definitely needed for a comparison with real systems, this remains outside the scope of the current work. We have added a remark to clarify this limitation.

      (3) The lack of accompanying code is a major drawback since nowadays open access to data and code is becoming a standard in computational research. Given that the turbulent fluid simulation is a key element that differentiates this paper, the absence of simulation and analysis code limits the study's reproducibility.

      We have published the code and the datasets at

      - code: https://github.com/Akatsuki96/qNav

      - datasets: https://zenodo.org/records/14655992

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 59-69: In comparing the results here to other approaches (especially the Verano and Singh papers), it would also be helpful to clarify which of these include an explicit representation of the wind direction. My understanding is that both the Singh and Verano approaches include an explicit representation of wind direction. In Singh wind direction is one of the observations that inputs to the agent, while in Verano, the actions are defined relative to the wind direction. In the current paper, my understanding is that there is no explicitly defined wind direction, but because movement directions are encoded allocentrically, the agent is able to learn the upwind direction from the structure of the plume- is this correct? I think this information would be helpful to spell out and also to address whether an agent without any allocentric direction sense can learn the task.

      Thank you for the comment. In our algorithm the directions are defined relative to the mean wind, which is assumed known, as in Verano et al. As far as we understand, Singh et al provide the instantaneous, egocentric wind velocities as part of the input.

      (1) Line 105: "several properties of odor stimuli depend on the distance from the source" might cite Boie...Victor 2018, Ackles...Schaefer, 2021, Nag...van Breugel 2024.

      Thank you for the suggestions - we have added these references

      (2) Line 130: "we first define a finite set of olfactory states" might be helpful to the reader to state what you chose in this paragraph rather than further down.

      We have slightly modified the incipit of the paragraph. We first declare we are setting out to craft the olfactory states, then define the challenges, finally we define the olfactory states.

      (3) Line 267: "Note that the learned recovery strategy resembles casting behavior observed in flying insects" Might note that insects seem to deploy a range of recovery strategies depending on locomotor mode and environment. For example, flying flies circle and sink when odor is lost in windless environments (Stupski and van Breugel 2024).

      Thank you for your comment. We have included the reference and we now added comparisons to results using circling and cast & surge recovery strategies.

      (4) Line 289: "from positions beyond the source, the learned strategy is unable to recover the plume as it mostly casts sideways, with little to no downwind action" This is curious as many insects show a downwind bias in the absence of odor that helps them locate the plumes in the first place (e.g. Wolf and Wehner, 2000, Alvarez-Salvado et al. 2018). Is it possible that the agent could learn a downwind bias in the absence of odor if given larger environments or a longer time to learn?

      The reviewer is absolutely correct – Downwind motion is not observed in the recovery simply because the agent rarely overshoots the source. Hence overall optimization for that condition is washed out by the statistics. We believe downwind motion will emerge if an agent needs to avoid overshooting the source – we do not have conclusive results yet but are planning to introduce such flexibility in a further work. We added this remark and refs.

      (5) Line 377-391: testing these ideas in living systems. Interestingly, Kathman..Nagel 2024 (bioRxiv) shows exactly the property predicted here and in Verano in fruit flies- an odor memory that outlasts the stimulus by a duration of several seconds, appropriate for filling in "blanks." Relatedly, Alvarez-Salvado et al. 2018 showed that fly upwind running reflected a temporal integration of odor information over ~10s, sufficient to avoid responding to blanks as loss of odor.

      Indeed, we believe this is the most direct connection between algorithms and experiments. We are excited to discuss with our colleagues and pursue a more direct comparison with animal behavior. We were aware of the references and forgot to cite them, thank you for your careful reading of our work !

      Reviewer #2 (Recommendations for the authors):

      Suggestions

      (1) The paper does not clearly specify which type of animals (e.g., flying insects, terrestrial mammals) the model is meant to approximate or not approximate. The authors should consider clarifying how these simulations are suited to be a general model across varied olfactory navigators. Further, it isn't clear how low/high the intermittency studied in this model is compared to what different animals actually encounter. (Minor: The Figure 4 occupancy circles visualization could be simplified).

      Environment 1 represents the lower layers of a moderately turbulent boundary layer. Search occurs on a horizontal plane ~half meter from the ground. The agent is trained at distances of about 10 meters and also tested on longer distances  ~ 17 meters (environment 6), lower heights ~1cm from the ground (environments 3-4), lower Reynolds number (environment 5) and higher threshold of detection (environment 2 and 4). Thus Environments 1,2,5 and 6 are representative of conditions encountered by flying organisms (or pelagic in water), and Environments 3 and 4 of searches near the substrate, potentially involved in terrestrial navigation (benthic in water). Even near the substrate, we use odor dispersed in the fluid, and not odor attached to the substrate (relevant to trail tracking).

      Also note that we pick Schmidt number Sc = 1 and this is appropriate for odors in air but not in water. However, we expect a weak dependence on the Schmidt number as the Batchelor and Kolmogorov scales are below the size of the source and we are interested in the large scale statistics Falkovich et al., 2001; Celani et al., 2014; Duplat et al., 2010.

      Intermittency contours are shown in Fig 1C, they are highest along the centerline, and decay away from the centerline, so that even within the plume detecting odor is relatively rare. Only a thin region near the centerline has intermittency larger than 66%; the outer and most critical bin of the plume has intermittency under 33%; in the furthest point on the centerline intermittency is <10%. For reference, experimental values in the atmospheric boundary layer report intermittency 25% to 20% at 2 to 15m from the source along the centerline (Murlis and Jones, 1981).

      We have more clearly labeled the contours in Fig 1C and added these remarks.

      We included these remarks and added a whole table with matching to real conditions within the different environments.

      (2) Could some biological examples and references be added to support that backtracking is a biologically plausible mechanism?

      Backtracking was observed e.g. in ants displaced in unfamiliar environments (Wystrach et al, P Roy Soc B, 280,  2013), in tsetse flies executing reverse turns uncorrelated to wind, which bring them back towards the location where they last detected odor (Torr, Phys Entom, 13, 1988, Gibson & Brady Phys Entom 10, 1985) and in coackroaches upon loss of contact with the plume (Willis et al, J. Exp. Biol. 211, 2008). It is also used in computational models of olfactory navigation (Park et al, Plos Comput Biol, 12:e1004682, 2016).

      (3) Hand-crafted features can be both a strength and a limitation. On the one hand, they offer interpretability, which is crucial when trying to model biological systems. On the other hand, they may limit the generality of the model. A more thorough discussion of this paper's limitations should address this.

      (4) The authors mention the possibility of feature engineering or using recurrent neural networks, but a more concrete discussion of these alternatives and their potential advantages/disadvantages would be beneficial. It should be noted that the hand-engineered features in this manuscript are quite similar to what the model of Singh et al suggests emerges in their trained RNNs.

      Merged answer to points 3 and 4.

      We agree with the reviewer that hand-crafted features are both a strength and a limitation in terms of performance and generality. This was a deliberate choice aimed at stripping the algorithm bare of implicit components, both in terms of features and in terms of memory. Even with these simple features, our model performs well in navigating across different signals, consistent with our previous results showing that these features are a “good” surrogate for positional information.

      To search for the most effective temporal features, one may consider a more systematic hand crafting, scaling up our approach. In this case one would first define many features of the odor trace; rank groups of features for their accuracy in regression against distance; train Q learning with the most promising group of features and rank again. Note however that this approach will be cumbersome because multiple factors will have to be systematically varied: the regression algorithm; the discretization of the features and the memory.

      Alternatively, to eliminate hand crafting altogether and seek better performance or generalization, one may consider replacing these hand-crafted features and the tabular Q-learning approach with recurrent neural networks or with finite state controllers. On the flip side, neither of these algorithms will directly provide the most effective features or the best memory, because these properties are hidden within the parameters that are optimized for. So extra work is needed to interrogate the algorithms and extract these information. For example, in Singh et al, the principal components of the hidden states in trained agents correlate with head direction, odor concentration and time since last odor encounter. More work is needed to move beyond correlations and establish more systematically what are the features that drive behavior in the RNN.

      We have added these points to the discussion.

      (5) Minor: the title of the paper doesn't immediately signal its focus on recovery strategies and their interplay with memory in the context of olfactory navigation. Given the many other papers using a similar RL approach, this might help the authors position this paper better.

      We agree with the referee and have modified the title to reflect this.

      (6) Minor: L 331: "because turbulent odor plumes constantly switch on and off" -- the signal received rather than the plume itself is switching on and off.

      Thank you for the suggestion, we implemented it.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the study "Re-focusing visual working memory during expected and unexpected memory tests" by Sisi Wang and Freek van Ede, the authors investigate the dynamics of attentional re-orienting within visual working memory (VWM). Utilizing a robust combination of behavioral measures, electroencephalography (EEG), and eye tracking, the research presents a compelling exploration of how attention is redirected within VWM under varying conditions. The research question addresses a significant gap in our understanding of cognitive processes, particularly how expected and unexpected memory tests influence the focus and re-focus of attention. The experimental design is meticulously crafted, enabling a thorough investigation of these dynamics. The figures presented are clear and effectively illustrate the findings, while the writing is concise and accessible, making the complex concepts understandable. Overall, this study provides valuable insights into the mechanisms of visual working memory and attentional re-orienting, contributing meaningfully to the field of cognitive neuroscience. Despite the strengths of the manuscript, there are several areas where improvements could be made.

      We thank the reviewer for this summary and positive appraisal of our study and our findings. In addition, we are of course grateful for the excellent suggestions for improvements that we have embraced to further strengthen our article. 

      Microsaccades or Saccades?

      In the manuscript, the terms "microsaccades" and "saccades" are used interchangeably. For instance, "microsaccades" are mentioned in the keywords, whereas "saccades" appear in the results section. It is crucial to differentiate between these two concepts. Saccades are large, often deliberate eye movements used for scanning and shifting attention, while microsaccades are small, involuntary movements that maintain visual perception during fixation. The authors note the connection between microsaccades and attention, but it is not well-recognized that saccades are directly linked to attention. Despite the paradigm involving a fixation point, it remains unclear whether large eye movements (saccades) were removed from the analysis. The authors mention the relationship between microsaccades and attention but do not clarify whether large eye movements (saccades) were excluded from the analysis. If large eye movements were removed during data processing, this should be documented in the manuscript, including clear definitions of "microsaccades" and "saccades." If such trials were not removed, the contribution of large eye movements to the results should be shown, and an explanation provided as to why they should be considered.

      We thank the reviewer for raising this relevant point. Before turning to this relevant distinction, we first wish to clarify how, for our main aim of tracking the dynamics of ‘re-orienting in working memory’, any spatial modulation in gaze – be it driven by micro- or macro-saccades – suits this purpose. Having made this explicit, we also fully agree that disambiguating the nature of the saccade bias during internal focusing has additional value.

      Because it is notoriously challenging (or at least inherently arbitrary) to draw an absolute fixed boundary between macro- and microsaccades, we instead decided to adopt a two-stage approach to our analysis (building on prior studies from our lab, e.g., de Vries et al., 2023; Liu et al., 2023; Liu et al., 2022). In the first step, we analysed spatial biases in all detected saccades no matter their size (hence our labelling of them as “saccades” when describing these analyses). In a second step, we decomposed and visualized the saccade-rate effect as a function of saccade size in degrees. This second stage directly exposed the ‘nature’ of the saccade bias, as we visualized in Figure 2c (with time on the x axis, saccade size on the y axis, and the spatial modulation color coded). Because these visualizations directly address this major comment, we have now made these key set of results much clearer in our work (we agree that our original visualization of this key aspect of our data was suboptimal). In addition, we have added similar plot for the saccade data in the test-phase in Supplementary Figure S2b.

      These complementary analyses show how the saccade bias (more toward than away saccades) is indeed predominantly driven by small saccades (hence are labelling as “micro-saccades” when interpreting our findings), and less so by larger saccades associated with looking back all the way to the location where the memory item had been presented at encoding (positioned at 6 degrees). This is important as it helps to arbitrate between fixational/micro-saccadic eye-movement biases (previously associated with covert and internal attention shifts; cf. de Vries et al., 2023; Engbert and Kliegl, 2003; Hafed and Clark, 2002; Liu et al., 2023; Liu et al., 2022) vs. larger eye movements back to the original locations of the item (previously associated with ‘looking at nothing’ during memory retrieval and imagery; cf. Brandt and Stark, 1997; Ferreira et al., 2008; Johansson and Johansson, 2014; Laeng et al., 2014; Martarelli and Mast, 2013; Spivey and Geng, 2001). By adopting this visualization, we can show this while preserving the richness of our data, and without having to a-priori set an (inherently arbitrary) threshold for classifying saccades as either “macro” or “micro”.

      Having explained our rationale, we nevertheless agree with the reviewer that it is worth showing how our time course results hold up when only considering fixational eye movements below 2 visual degrees, which we consider “fixational” provided that our memory stimuli at encoding were presented at 6 visual degrees from central fixation. We show this in Supplementary Figure S1. As can be seen below, our main saccade bias results stay almost the same when restricting our analyses exclusively to fixational saccades within 2 degrees, both when considering our data after the retrocue (Supplementary Figure S1a) as well as after the memory test (Supplementary Figure S1b).

      Because we agree this is important complementary data, we have now added this as supplementary figures. In addition, we have added the results to our article. We also point to these additional corroborating findings at key instances in our article:  

      Page 5 (Results)

      “As in prior studies from our lab with similar experimental set-ups, internal attentional focusing was predominantly driven by fixational micro-saccades (small, involuntary eye-movements around current fixation). To reveal this in the current study, we decomposed and visualized the observed saccade-rate effect as a function of saccade size (Figure 2c), following the same procedure as we have adopted in other recent studies on this bias (de Vries et al., 2023; Liu et al., 2023; Liu et al., 2022). As shown in the saccade-size-over-time plots in Figure 2c, also in the current study, the difference between toward and away saccades (with red colours denoting more toward saccades) was predominantly driven by fixational saccades in the micro-saccades range (< 2°).”

      “Moreover, as shown in Supplementary Figure S1a, complementary analyses show that our time course (saccade bias) results hold even when exclusively considering eye movements below 2 visual degrees that we defined as “fixational” provided that the memory items were presented 6 visual degrees from the fixation during encoding. This further corroborates that the bias observed during internal attentional focusing was predominantly driven by fixational micro-saccades rather than looking back to the encoded location of the memory items (cf. Johansson and Johansson, 2014; Richardson and Spivey, 2000; Spivey and Geng, 2001; Wynn et al., 2019).”

      Page 7 (Results):

      “As shown in the corresponding saccade-size-over-time plots in Supplementary Figure S2b, consistent with what we observed following the cue, the difference between toward and away saccades following the test was again predominantly driven by saccades in the fixational microsaccade range (< 2°), and the time course (saccade bias) results hold even when exclusively considering fixational eye movements below 2 visual degrees (Supplementary Figure S1b). Thus, just like mnemonic focusing after the cue, re-orienting after the memory test was also predominantly reflected in fixational micro-saccades, and not looking back at the original location of the memory items that were encoded at 6 degrees away from central fixation.”

      Alpha Lateralization in Attentional Re-orienting

      In the attentional orienting section of the results (Figure 2), the authors effectively present EEG alpha lateralization results with time-frequency plots and topographic maps. However, in the attentional reorienting section (Figure 3), these visualizations are absent. It is important to note that the time period in attentional orienting differs from attentional re-orienting, and consequently, the time-frequency plots and topographic maps may also differ. Therefore, it may be invalid to compute alpha lateralization without a clear alpha activity difference. The authors should consider including timefrequency plots and topographic maps for the attentional re-orienting period to validate their findings.

      We thank the reviewer also for this constructive suggestion. The reason we did not expand on the time-frequency maps and topographies at the test-stage was the relative lack of alpha effects at the test stage (compared to the clearer alpha modulations after the retrocue). Nevertheless, we agree that including these data will increase transparency and the comprehensiveness of our article. We now added time-frequency plots and topographic maps for alpha lateralization in response to the workingmemory test in Supplementary Figure S2. As can be seen, the time-frequency plots and topographies in the re-focusing period after the working-memory test were consistent with our time-series plots in Figure 3a – reinforcing how alpha lateralization is generally not clear following the working-memory test. In accordance with this relevant addition, we added the following in the revised manuscript:

      Page 7 (Results):

      “For complementary time-frequency and topographical visualizations, see Supplementary Figure S2a.”

      Onset and Offset Latency of Saccade Bias

      The use of the 50% peak to determine the onset and offset latency of the saccade bias is problematic. For example, if one condition has a higher peak amplitude than another, the standard for saccade bias onset would be higher, making the observed differences between the onset/offset latencies potentially driven by amplitude rather than the latencies themselves. The authors should consider a more robust method for determining saccade bias onset and offset that accounts for these amplitude differences.

      We thank the reviewer for raising this valuable point. We agree that the calculation of onset and offset latencies of the saccade bias could be influenced by the peak amplitude of the waveforms. Thus, we further conducted the Fractional Area Latency (FAL) analysis on the comparison of the saccade bias following the working-memory test between valid cue (expected test) and invalid cue (unexpected test) trials. The FAL analysis has been commonly applied to Event-Related Potentials (ERPs) to estimate the latency of ERP components (Hansen and Hillyard, 1980; Luck, 2005). Instead of relying on the peak latency, the FAL method calculates latency based on a predefined fraction of the area under the waveform. This can provide a more robust measure of component latency. Prompted by this comment, we now also applied FAL analysis to our saccade bias waveforms. This corroborated our original conclusion. Because we believe this is an important complement, we now added these additional outcomes to our article: 

      Page 9 (Results): 

      “We additionally conducted Fractional Area Latency (FAL) analysis on the comparison of the saccade bias following the memory test between valid- and invalid-cue trials to rule out the potential contribution of peak amplitude differences into the onset and offset latency differences (Hansen and Hillyard, 1980; Kiesel et al., 2008; Luck, 2005). Consistent with our jackknife-based latency analysis, the FAL analysis revealed a significantly prolonged saccade bias following the unexpected tests (the invalid-cue trials) vs. expected tests (the valid-cue trials) in both 80% and 60% cue-reliability conditions (411 ms vs. 463 ms, t<sub>(14)</sub> = 2.358, p = 0.034; 417 ms vs. 468 ms, t<sub>(15)</sub> = 2.168, p = 0.047; for 80% and 60%, respectively). Again, there was no significant difference in onset latency following unexpected vs. expected tests. (346 ms vs. 374 ms, t<sub>(14)</sub> = 2.052, p = 0.060; 353 ms vs. 401 ms, t<sub>(15)</sub> = 1.577, p = 0.136; for 80% and 60%, respectively).”

      In accordance, we also added the following to our Methods:

      Page 18 (Methods): 

      “In addition to the jackknife-based latency analysis, we further applied a Fractional Area Latency (FAL) method to the saccade bias comparison between validly and invalidly cued memory tests to rule out the contribution of the peak amplitude difference into the onset and offset latency difference (Hansen and Hillyard, 1980; Kiesel et al., 2008; Luck, 2005). We first defined the onset and offset latency of the saccade bias as the first time point at which 25% or 75% of the total area of the component has been reached, relative to a lower boundary of a difference of 0.3 Hz between toward and away saccades (to remove the influence of noise fluctuations in our difference time course below this lower boundary). The extracted onset and offset latency for all participants was then compared using paired-samples t-tests.”

      Control Analysis for Trials Not Using the Initial Cue

      The control analysis for trials where participants did not use the initial cue raises several questions:

      (1) The authors claim that "unlike continuous alpha activity, saccades are events that can be classified on a single-trial level." However, alpha activity can also be analyzed at the single-trial level, as demonstrated by studies like "Alpha Oscillations in the Human Brain Implement Distractor Suppression Independent of Target Selection" by Wöstmann et al. (2019). If single-trial alpha activity can be used, it should be included in additional control analyses.

      We agree with the reviewer that alpha activity can also be analyzed at the single-trial level. However, because alpha is a continuous signal, single-trial alpha activity will necessarily be graded (trials with more or less alpha power). This is still different from saccades, that are not continuous signals but true ‘events’ (either a saccade was made, or no saccade was made, with no continuum in between). Because of this unique property, it is possible to sort trials by whether a saccade was present (and, if present, by its direction), in an all-or-none way that is not possible for alpha activity that can only be sorted by its graded amplitude/power. This is the key distinction underlying our motivation to sort the trials based on saccades, as we now make clearer: 

      Page 10 (Results): 

      “Although alpha can also be analyzed as the single trial level (e.g. Macdonald et al., 2011; Wöstmann et al., 2019; for a review, see Kosciessa et al., 2020), saccades offer the unique opportunity to split trials not by graded amplitude fluctuations but by discrete all-or-none events.” 

      In addition, please note how our saccade markers were also more reliable/sensitive, especially in the subsequent memory-test-phase of interest. This is another reason we decided to focus this control analysis on saccades and not alpha activity. 

      (2) The authors aimed to test whether the re-orienting signal observed after the test is not driven exclusively by trials where participants did not use the initial cue. They hypothesized that "in such a scenario, we should only observe attention deployment after the test stimulus in trials in which participants did not use the preceding retro cue." However, if the saccade bias is the index for attentional deployment, the authors should conduct a statistical test for significant saccade bias rather than only comparing toward-saccade after-cue trials with no-toward-saccade after-cue trials. The null results between the two conditions do not immediately suggest that there is attention deployment in both conditions.

      We thank the reviewer for bringing up this important point. We fully agree and, in fact, we had conducted the relevant statistical analysis for each of the conditions separately (in addition to their comparison). Upon reflection, we came to realize that in our original submission it was easy to overlook this point, and therefore thank the reviewer for flagging this. To make this clearer, we now also added the relevant statistical clusters in Figure 4a,b and more clearly report them in the associated text: 

      Page 10 (Results):

      “As we show in Figure 4a,b, we found clear gaze signatures of attentional deployment in response to expected (valid) memory tests, no matter whether we had pre-selected trials in which we had also seen such deployment after the cue in gaze (cluster P: 0.115, 0.041, 0.027, <0.001 for 80%-valid, 60%-valid, 80%-invalid, 60%-invalid trials, respectively), or not (cluster P: 0.016, 0.009, 0.001, <0.001 for 80%-valid, 60%-valid, 80%-invalid, 60%-invalid trials, respectively).”

      (3) Even if attention deployment occurs in both conditions, the prolonged re-orienting effect could also be caused by trials where participants did not use the initial cue. Unexpected trials usually involve larger and longer brain activity. The authors should perform the same analysis on the time after the removal of trials without toward-saccade after the cue to address this potential confound.

      We thank the reviewer for raising this. It is crucial to point out, however, that after any given 80% or 60% reliable cue, the participants cannot yet know whether the subsequent memory test in that trial will be expected (valid cue) or unexpected (invalid cue). Accordingly, the prolonged re-orienting after unexpected vs. expected memory tests cannot be explained by differential use of the cue (i.e., differential cue-use cannot be a “confound” for differential responses to expected and unexpected memory tests, as observed within the 80 and 60% cue-reliability conditions). 

      Reviewer #2 (Public Review):

      Summary:

      This study utilized EEG-alpha activity and saccade bias to quantify the spatial allocation of attention during a working memory task. The findings indicate a second stage of internal attentional deployment following the appearance of a memory test, revealing distinct patterns between expected and unexpected test trials. The spatial bias observed during the expected test suggests a memory verification process, whereas the prolonged spatial bias during the unexpected test suggests a reorienting response to the memory test. This work offers novel insights into the dynamics of attentional deployment, particularly in terms of orienting and re-orienting following both the cue and memory test.

      Strengths:

      The inclusion of both EEG-alpha activity and saccade bias yields consistent results in quantifying the attentional orienting and re-orienting processes. The data clearly delineate the dynamics of spatial attentional shifts in working memory. The findings of a second stage of attentional re-orienting may enhance our understanding of how memorized information is retrieved.

      Weaknesses:

      Although analyses of neural signatures and saccade bias provided clear evidence regarding the dynamics of spatial attention, the link between these signatures and behavioral performance remains unclear. Given the novelty of this study in proposing a second stage of 'verification' of memory contents, it would be more informative to present evidence demonstrating how this verification process enhances memory performance.

      We thank the reviewer for the positive summary of our work and for highlighting key strengths. We also appreciate the constructive suggestions, such as addressing the link between our observed refocusing signals and behavioral performance in our task. We now performed these additional analyses and added their outcomes to the revised article, as we detail in response to comment 2 below.  

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 2 shows graded spatial modulations in both EEG-alpha activity and saccade bias. However, while the imperative 100% cue conditions and 100% validity conditions largely overlap in EEG-alpha activity, a clear difference is present between these two conditions in saccade bias. The cause of the difference in saccade bias is unclear.

      We thank the reviewer for pointing out this interesting difference. At this stage, it is hard to know with certainty whether this reflects a genuine difference in our 100% reliable and 100% imperative cue conditions that is selectively picked up by our gaze but not alpha marker. Alternatively, this may reflect differential sensitivity of our two markers to different sources of noise. Either way, we agree that this observation is worth calling out and reflecting on when discussing these results: 

      Page 6 (Results):  

      “It’s worth noting that while alpha lateralization shows very comparable amplitudes in the imperative-100% and 100% conditions, the saccade bias was larger following imperative-100% vs. 100% reliable cues. This may reflect a difference between these two cueing conditions that is selectively picked up by our gaze marker (though it may also reflect differential sensitivity of our two markers to different sources of noise). […]”

      (2) Figure 3 shows signatures of attentional re-orienting after the memory test presented at the center. When the cue was not 100% valid, a noticeable saccade bias towards the memorized location of the test item was observed. This finding was explained as reflecting a re-orienting to the mnemonic contents. To strengthen this interpretation, I suggest providing evidence for the link between the attentional re-orienting signatures and memory performance.

      We thank the reviewer for this constructive suggestion. We now sorted trials by behavioral performance using a median split on RT (fast-RT vs. slow-RT trials) and reproduction error (highaccuracy vs. low-accuracy trials).  Because we believe the outcomes of these analyses increase transparency as well as the comprehensiveness of our article, we have now included them as Supplementary Figure S3.

      As shown below, we were able to link the saccade bias following the memory test to subsequent performance, but this reached significance only for the 80% valid-cue trials when splitting by RT (cluster P = 0.001). For the other conditions, we could not establish a reliable difference by our performance splits. Possibly this is due to a lack of sensitivity, given the relatively large number of conditions we had and, consequently, the relatively small number of trials we therefore had per condition (particularly in the invalid-cue condition with unexpected memory tests). We now bring forward these additional outcomes at the relevant section in our Results: 

      Page 7 (Results):

      “We also sorted patterns of gaze bias after the memory test by performance but could only establish a link between this gaze bias and RT following expected memory tests in our 80% cuereliability condition (cluster P = 0.001, Supplementary Figure S3). The lack of significant statistical differences in the remaining conditions may possibly reflect a lack of sensitivity (insufficient trial numbers) for this additional analysis.”

      (3) When comparing the time course of attentional re-orienting after the memory test, a prolonged attentional re-orienting was observed for unexpected memory tests compared to the expected ones. While the onset latency was similar for unexpected and expected memory tests, the offset latency was prolonged for the unexpected memory test. Could this be attributed to the learned tendency to saccade toward the expected location in more valid trials? In this case, the prolonged re-orienting may indicate increased efforts in suppressing the previously learned tendency.

      We thank the reviewer for bringing up this interesting possibility. In our original interpretation, this prolonged signal reflects a longer time needed to bring the unexpected memory content ‘back in focus’ before being able to report its orientation. At the same time, we agree that there are alternative explanations possible, such as the one raised by the reviewer. We now make this clearer when discussing this finding: 

      Page 14 (Discussion): 

      “[…] attentional deployment did become prolonged when re-focusing the unexpected memory item, likely reflecting prolonged effort to extract the relevant information from the memory item that was not expected to be tested. However, there may also be alternative accounts for this observation, such as suppressing a learned tendency to saccade in the direction of the expected item following an unexpected memory test.”

      (4) To test whether the re-orienting signature is predominantly influenced by trials where participants delayed the use of cue information until the memory test appeared, the authors sorted the trials based on saccade bias after the initial cue. However, it would be more informative to depict the reorienting patterns by sorting trials based on memory performance. The rationale is that in trials where participants delayed using the initial retro-cue, memory performance (e.g., measured by reproduction error) might be less precise due to the extended memory retention period. Compared to saccade bias for initial orienting, memory performance could provide more reliable evidence as it represents a more independent measure.

      We thank the reviewer for this suggestion. As delineated in response to comment 2, we now conducted this additional analysis and added the relevant outcomes to our article.  

      (5) While the number of trials was well-balanced across blocks (~ 240 trials), how did the authors address the imbalance between valid and invalid trials, especially in the 80% cue validity block?

      We thank the reviewer for raising this point.  First, we wish to point out that while trial numbers will indeed impact the sensitivity for finding an effect, trial numbers do not bias the mean – and therefore also not the comparison between means. In this light, it is vital to appreciate that our findings do not reflect a significant effect in valid trials but no significant effect in invalid trials (which we agree could be due to a difference in trial numbers), but rather a statistical difference between valid and invalid trials. This significant difference in the means between valid and invalid true cannot be attributed to a difference in trial numbers between these conditions. 

      Having clarified this, we nevertheless agree that it is also worthwhile to empirically validate this assertion and show how our findings hold even when carefully matching the number of trials between valid and invalid conditions (i.e., between expected and unexpected memory tests). To do so, we ran a sub-sampling analysis where we sub-sampled the number of valid trials to match the number of invalid trials available per condition (and averaged the results across 1000 random sub-samplings to increase reliability). As anticipated, this replicated our findings of robust differences between the gaze bias following expected and unexpected memory tests in both our 80 and 60% cue-reliability conditions. We now present these additional outcomes in Supplementary Figure S4.

      Because we agree this is an important re-assuring control analysis, we have now added this to our article:

      Page 9 (Results):

      “To rule out the possibility that the saccade-bias differences following expected and unexpected memory tests are caused by uneven trial numbers (200 vs. 50 trials in the 80% cuereliability condition, 150 vs. 100 trials in the 60% cue-reliability condition), we ran a subsampling analysis where we sub-sampled the number of valid trials to match the number of invalid trials available per condition (averaging the results across 1000 random sub-samplings to increase reliability). As shown in Supplementary Figure S4, this complementary subsampling analysis confirmed that our observed differences between the saccade bias following expected and unexpected memory tests in both 80% and 60% cue-reliability conditions are robust even when carefully matching the number of trials between validly cued (expected) and invalidly cued (unexpected) memory test.”

      Reviewer #3 (Public Review):

      Summary:

      Wang and van Ede investigate whether and how attention re-orients within visual working memory following expected and unexpected centrally presented memory tests. Using a combination of spatial modulations in neural activity (EEG-alpha lateralization) and gaze bias quantified as time courses of microsaccade rate, the authors examined how retro cues with varying levels of reliability influence attentional deployment and subsequent memory performance. The conclusion is that attentional reorienting occurs within visual working memory, even when tested centrally, with distinct patterns following expected and unexpected tests. The findings provide new value for the field and are likely of broad interest and impact, by highlighting working memory as an action-bound process (in)dependent on (an ambiguous) past.

      Strengths:

      The study uniquely integrates behavioral data (accuracy and reaction time), EEG-alpha activity, and gaze tracking to provide a comprehensive analysis of attentional re-orienting within visual working memory. As typical for this research group, the validity of the findings follows from the task design that effectively manipulates the reliability of retro cues and isolates attentional processes related to memory tests. The use of well-established markers for spatial attention (i.e. alpha lateralization) and more recently entangled dependent variable (gaze bias) is commendable. Utilizing these dependent metrics, the concise report presents a thorough analysis of the scaling effects of cue reliability on attentional deployment, both at the behavioral and neural levels. The clear demonstration of prolonged attentional deployment following unexpected memory tests is particularly noteworthy, although there are no significant time clusters per definition as time isn't a factor in a statistical sense, the jackknife approach is convincing. Overall, the evidence is compelling allowing the conclusion of a second stage of internal attentional deployment following both expected and unexpected memory tests, highlighting the importance of memory verification and re-orienting processes.

      Weaknesses:

      I want to stress upfront that these weaknesses are not specific to the presented work and do not affect my recommendation of the paper in its present form.

      The sample size is consistent with previous studies, a larger sample could enhance the generalizability and robustness of the findings. The authors acknowledge high noise levels in EEG-alpha activity, which may affect the reliability of this marker. This is a general issue in non-invasive electrophysiology that cannot be handled by the authors but an interested reader should be aware of it. Effectively, the sensitivity of the gaze analysis appears "better" in part due to the better SNR. The latter also sets the boundaries for single-tiral analyses as the authors correctly mention. In terms of generalizability, I am convinced that the main outcome will likely generalize to different samples and stimulus types. Yet, as typical for the field future research could explore different contexts and task demands to validate and extend the findings. The authors provide here how and why (including sharing of data and code).

      We thank the reviewer for summarising our work and for carefully delineating its strengths. We also appreciate the mentioning of relevant generic limitations and agree that important avenues for future studies will be to expand this work with larger sample sizes, complementary measurement techniques, and complementary task contexts and stimuli.    

      Reviewer #3 (Recommendations For The Authors):

      In the conclusion, Wang and van Ede successfully demonstrate that attentional re-orienting occurs within visual working memory following both expected and unexpected tests. The conclusions are supported by the data and analyses applied, showing that attentional deployment is by the reliability of retro cues. Centrally presented memory tests can invoke either a verification or a revision of internal focus, the latter thus far not considered in both theory and experimental design in cognitive neuroscience.

      I don't have any recommendations that will significantly change the conclusions.

      We thank the reviewer for having carefully evaluated our work and hope the reviewer will also perceive the changes we made and the additional analyses we added in responses to the other two reviewers as further strengthening our article.

      Reference

      Brandt SA, Stark LW. 1997. Spontaneous eye movements during visual imagery reflect the content of the visual scene. J Cogn Neurosci 9. doi:10.1162/jocn.1997.9.1.27

      de Vries E, Fejer G, van Ede F. 2023. No obligatory trade-off between the use of space and time for working memory. Communications Psychology.

      Engbert R, Kliegl R. 2003. Microsaccades uncover the orientation of covert attention. Vision Res 43. doi:10.1016/S0042-6989(03)00084-1

      Ferreira F, Apel J, Henderson JM. 2008. Taking a new look at looking at nothing. Trends Cogn Sci 12. doi:10.1016/j.tics.2008.07.007

      Hafed ZM, Clark JJ. 2002. Microsaccades as an overt measure of covert attention shifts. Vision Res 42. doi:10.1016/S0042-6989(02)00263-8

      Hansen JC, Hillyard SA. 1980. Endogeneous brain potentials associated with selective auditory attention. Electroencephalogr Clin Neurophysiol 49. doi:10.1016/0013-4694(80)90222-9

      Johansson R, Johansson M. 2014. Look Here, Eye Movements Play a Functional Role in Memory Retrieval. Psychol Sci 25. doi:10.1177/0956797613498260

      Kiesel A, Miller J, Jolicœur P, Brisson B. 2008. Measurement of ERP latency differences: A comparison of single-participant and jackknife-based scoring methods. Psychophysiology 45. doi:10.1111/j.1469-8986.2007.00618.x

      Kosciessa JQ, Grandy TH, Garrett DD, Werkle-Bergner M. 2020. Single-trial characterization of neural rhythms: Potential and challenges. Neuroimage 206. doi:10.1016/j.neuroimage.2019.116331

      Laeng B, Bloem IM, D’Ascenzo S, Tommasi L. 2014. Scrutinizing visual images: The role of gaze in mental imagery and memory. Cognition 131. doi:10.1016/j.cognition.2014.01.003

      Liu B, Alexopoulou SZ, van Ede F. 2023. Jointly looking to the past and the future in visual working memory. Elife.

      Liu B, Nobre AC, van Ede F. 2022. Functional but not obligatory link between microsaccades and neural modulation by covert spatial attention. Nat Commun 13. doi:10.1038/s41467-022-312173

      Luck S. 2005. Ten Simple Rules for Deisgning ERP Experiments. Event-related potentials: A methods handbook.

      Macdonald JSP, Mathan S, Yeung N. 2011. Trial-by-trial variations in subjective attentional state are reflected in ongoing prestimulus EEG alpha oscillations. Front Psychol 2. doi:10.3389/fpsyg.2011.00082

      Martarelli CS, Mast FW. 2013. Eye movements during long-term pictorial recall. Psychol Res 77. doi:10.1007/s00426-012-0439-7

      Richardson DC, Spivey MJ. 2000. Representation, space and Hollywood Squares: Looking at things that aren’t there anymore. Cognition 76. doi:10.1016/S0010-0277(00)00084-6

      Spivey MJ, Geng JJ. 2001. Oculomotor mechanisms activated by imagery and memory: Eye movements to absent objects. Psychol Res 65. doi:10.1007/s004260100059

      van Ede F, Chekroud SR, Nobre AC. 2019. Human gaze tracks attentional focusing in memorized visual space. Nat Hum Behav. doi:10.1038/s41562-019-0549-y

      Wöstmann M, Alavash M, Obleser J. 2019. Alpha oscillations in the human brain implement distractor suppression independent of target selection. Journal of Neuroscience 39. doi:10.1523/JNEUROSCI.1954-19.2019

      Wynn JS, Shen K, Ryan JD. 2019. Eye movements actively reinstate spatiotemporal mnemonic content. Vision (Switzerland) 3. doi:10.3390/vision3020021

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript investigates the role of the membrane-deforming cytoskeletal regulator protein Abba in cortical development and its potential implications for microcephaly. It is a valuable contribution to the understanding of Abba's role in cortical development. The strengths and weaknesses identified in the manuscript are outlined below:

      Clinical Relevance:

      The authors identified a patient with microcephaly and a patient with an intellectual disability harboring a mutation in the Abba variant (R671W) adding a clinically relevant dimension to the study.

      Mechanistic Insights:

      The study offers valuable mechanistic insights into the development of microcephaly by elucidating the role of Abba in radial glial cell proliferation, radial fiber organization, and the migration of neuronal progenitors. The identification of Abba's involvement in the cleavage furrow during cell division, along with its interaction with Nedd9 and positive influence on RhoA activity, adds depth to our understanding of the molecular processes governing cortical development. Though the reported results establish the novel interaction between Abba and Nedd9, the authors have not addressed whether the mutant protein loses this interaction and whether that results in the observed effects.

      We appreciate the reviewer’s observation and fully agree that our study does not provide direct evidence that the phenotypes induced by the R671W mutant are mediated through NEDD9. We sincerely apologize if the manuscript inadvertently conveyed this impression.

      While we show that the interaction with NEDD9 plays a role in the action of ABBA, our findings suggest that NEDD9 and RhoA activation have a minor influence on the phenotypes induced by this mutation, as highlighted by the evidence we presented.

      We would like to point out that we have previously addressed this point in the discussion section of the manuscript. For clarity, below is an excerpt from that section:

      “heterozygous expression of the human R671W variant would exert a dominant negative effect on ABBA's role in brain development, leading to microcephaly and cognitive delay. This notion is supported by recent work disclosing additional patient carrying the R671W variant42. In the same study the significant neurological phenotypes were observed in a drosophila model where the ortholog of human MTSS2 and MTSS1 mim was deleted.   However, from a clinical genetics’ standpoint, it is unlikely to find patients with the recurrent R671W mutation without any homozygous or compound heterozygous loss-of-function mutations elsewhere in the ABBA gene. This could also suggest a gain-of-function effect of the R671W mutation. Supporting this notion, overexpressing ABBA-R671W in cells expressing the wild-type Abba in this study did not result in a dominant-negative decrease in RhoA activation, nor did it affect the expression of PH3 in vivo. These findings make it plausible to suggest that a mechanism responsible for the phenotype associated with overexpression of the human variant may primarily involve post-cell division processes, such as cell migration. “

      We have made corrections to the new version of the manuscript to emphasize this further.

      In Vivo Validation:

      The overexpression of mutant Abba protein (R671W) resulting in phenotypic similarities to Abba knockdown effects supports the significance of Abba in cortical development.

      Reviewer #2 (Public Review):

      Summary:

      Carabalona and colleagues investigated the role of the membrane-deforming cytoskeletal regulator protein Abba (MTSS1L/MTSS2) in cortical development to better understand the mechanisms of abnormal neural stem cell mitosis. The authors used short hairpin RNA targeting Abba20 with a fluorescent reporter coupled with in-utero electroporation of E14 mice to show changes to neural progenitors. They performed flow cytometry for in-depth cell cycle analysis of Abba-shRNA impact on neural progenitors and determined an accumulation in the S phase. Using culture rat glioma cells and live imaging from cortical organotypic slides from mice in utero electroporated with Abba-shRNA, the authors found Abba played a prominent role in cytokinesis. They then used a yeast-two-hybrid screen to identify three high-confidence interactors: Beta-Trcp2, Nedd9, and Otx2. They used immunoprecipitation experiments from E18 cortical tissue coupled with C6 cells to show Abba's requirement for Nedd9 localization to the cleavage furrow/cytokinetic bridge. The authors performed a shRNA knockdown of Nedd9 by in-utero electroporation of E14 mice and observed similar results as with the Abba-shRNA. They tested a human variant of Abba using in-utero electroporation of cDNA and found disorganized radial glial fibers and misplaced, multipolar neurons, but lacked the impact of cell division seen in the shRNA-Abba model.

      Strengths:

      A fundamental question in biology about the mechanics of neural stem cell division.

      Directly connecting effects in Abba protein to downstream regulation of RhoA via Nedd9.

      Incorporation of human mutation in ABBA gene.

      Use of novel technologies in neurodevelopment and imaging.

      Weaknesses:

      Unexplored components of the pathway (such as what neurogenic populations are impacted by Abba mutation) and unleveraged aspects of their data (such as the live imaging) limit the scope of their findings and leave significant questions about the effect of ABBA on radial glia development.

      (1) The claim of disorganized radial glial fibers lacks quantifications.

      On page 11, the authors claim that knockdown of Abba leads to changes in radial glial morphology observed with vimentin staining. Here they claim misoriented apical processes, detached end feet, and decreased number of RGP cells in the VZ. However, they do not provide quantification of process orientation to better support their first claim. Measurements of radial glia fiber morphology (directionality, length) and angle of division would be metrics that can be applied to data.

      In the corrected version of the manuscript, we provide new qualification of changes in dispersion of vimentin immunostaining (Supplementary Figure 1).

      Some of these analyses could be done in their time-lapse microscopy images, such as to quantify the number of cell divisions during their period of analysis (though that is short-15 hours).

      This is indeed a very good idea. We have reanalyzed the recordings to follow cell division. Unfortunately, the number of cells that we were able to follow was low, making statistical analysis of the data unreliable.  As the reviewer alluded in the comment longer recording times than 15h are required to make reliable conclusion. Instead, we have performed live-cell imaging using Aniling-GFP coelectroporeted with RFP as a marker of mitotic progression . We monitored the distribution of cells showing accumulation of Anillin-GFP in control (Scramble) and ABBA-shRNA3 conditions (this data was added to new Supplementary Figure 3). Anillin has been shown to be an efficient tool to monitor cell division in vivo as in particular as it displays accumulation and correlated increase intensity of Anillin-GFP ((Hesse et al Nature Com. 2012, DOI: 10.1038/ncomms2089).

      (2) It is unclear where the effect is:

      -In RG or neuroblasts? Is it in cell cleavage that results in the accumulation of cells at VZ (as sometimes indicated by their data like in Figure 2A or 4D)?

      The data suggest that radial glial (RG) cells are indeed blocked prior to abscission. This phenomenon might contribute to the accumulation of cells at the ventricular zone (VZ), as indicated by observations such as those in Figure 2A and 4D. The interruption in cell cleavage likely prevents the proper progression of division, causing RG cells to remain at the VZ rather than proceeding with their normal differentiation or migration processes. This finding highlights a potential mechanistic link between disrupted abscission and cell accumulation in the VZ.

      Interrogation of cell death (such as by cleaved caspase 3) would also help.  

      Caspase-3 cleavage is widely used as a marker for apoptosis; however, it may not be the most reliable tool for monitoring apoptosis during brain cortical development. The developing brain is a highly dynamic environment where caspase-3 activation can be transient and involved in non-apoptotic processes, such as synaptic pruning and neuronal remodeling. This makes it challenging to distinguish caspase-3 activity associated with apoptosis from its roles in physiological processes.

      In contrast, monitoring overall cell survival provides a more reliable measure of developmental outcomes, as it reflects the net balance of cell death and survival mechanisms. By focusing on cell survival e.g. quantification of number of RGP, we can better assess the functional consequences of apoptosis and its interplay with neurogenesis and other developmental processes.  In line with this we have added more data on the quantification of RGPC as well as their distribution in new Supplementary Figure 3. 

      Given their time-lapse, can they identify what is happening to the RG fiber?

      Both apical and basal endfeet appear to detach and retract prior to radial glial (RG) cell death. This is evident in Figure 1D, as well as from our observation of cellular bodies located far from the ventricular surface (VS), as demonstrated in the new Supplementary Figure 3.

      The authors describe a change in "migration" but do not show evidence for this for either progenitor or neuroblast populations. Given they have nice time-lapse imaging data, could they visualize progenitor versus young neuron migration? Analysis of neuroblasts (such as with doublecortin expression in the tissue) would also help understand any issues in migration (of neurons v stem cells).

      This is an excellent question that arises from the extensive data presented in this study. Addressing it would require repeating a significant portion of the experiments. We fully agree with the reviewer that these are important and obvious questions that warrant a dedicated study to answer them thoroughly. Additionally, we believe that the data showing the accumulation of migrating electroporated cells in the ventricular (V) and subventricular (SV) zones provide compelling evidence of abnormal migration in ABBA-shRNA electroporated cells.

      -At cleavage furrow? In abscission? There is high-resolution data that highlights the cleavage furrow as the location of interest (Figure 3A), however, there is also data (Figure 3B) to suggest Abba is expressed elsewhere as well and there is an overall soma decrease. More detail of the localization of Abba during the division process would be helpful for example, could cleavage furrow proteins, such as Aurora B, co-localization (and potentially co-IP) help delineate subpopulations of Abba protein? Furthermore, the FRET imaging is a unique way to connect their mutation with function - could they measure/quantify differences at furrow compared to the rest of soma to further corroborate that the Abba-associated RhoA effect was furrow-enriched?

      In the corrected version of the manuscript, we include new quantification of RhoA activity in the region corresponding to the cleavage furrow (New Figure 5), This new data show similar results as the previous and indicate that the changes observed are primarily derived from the cleavage furrow region. In the future a detailed dissection of the molecules involved in the mechanism would be highly desirable. These notions are now included in the discussion. 

      -The data highlights nicely that a furrow doesn't clearly form when ABBA expression and subsequent RhoA activity are decreased (in Figure 3 or 5A). Does this lead to cells that can't divide because of poor abscission, especially since "rounding" still occurs? Or abnormal progenitors (with loss of fiber or inability to support neuroblast migration)? Or abnormal progression of progenitors to neuroblasts?

      Our findings, combined with previous results, suggest multiple mechanisms through which ABBA depletion and subsequent Nedd9 and RhoA signaling disruptions could impact progenitor cells and neuroblasts. Below is a detailed response to each question: 

      (1) Do cells fail to divide due to poor abscission?

      Nedd9 is a key regulator of RhoA signaling, which could be essential for cleavage furrow ingression and abscission. Reduced Nedd9 expression may leads to non-activation of RhoA, thereby impairing cleavage furrow ingression. Furthermore, since RhoA deactivation is critical for successful abscission, any disruption in this signaling pathway could compromise the final stages of cytokinesis. While we do not directly observe failed abscission, the impaired furrow formation in Figure 3 and 5A aligns with the hypothesis that some cells may struggle to complete division due to defects in RhoA-mediated abscission. 

      (2) Are abnormal progenitors generated (e.g., loss of fiber or inability to support neuroblast migration)?

      Disrupted Nedd9 expression not only affects cell cycle progression but also influences the structural integrity of radial glial progenitors (RGPs). RGPs with impaired cleavage furrow ingression may exhibit detachment of apical and basal endfeet (Supplementary Figure 3), leading to abnormalities in their scaffold function. This structural disruption likely contributes to the accumulation of electroporated cells in the ventricular (V) and subventricular (SV) zones (Figure 5A), supporting the idea that abnormal progenitors fail to support proper neuroblast migration. 

      (3) Is there abnormal progression of progenitors to neuroblasts?

      Given that Nedd9 triggers cells to enter mitosis, its impaired function may prevent progenitors from properly progressing through the cell cycle, causing cell cycle arrest and eventual decrease survival. This would directly impact the ability of progenitors to transition into neuroblasts. Moreover, the abnormal membrane composition and PI(4,5)P2 enrichment we hypothesize during cytokinesis could disrupt ABBA recruitment and its interaction with Nedd9. This disruption would impair RhoA activation, further compromising the progression of progenitors to neuroblasts. 

      In conclusion, our findings suggest that impaired ABBA expression disrupts Nedd9 and RhoA signaling, leading to poor cleavage furrow ingression, abnormal progenitor structure, and defective neuroblast migration. These processes collectively contribute to developmental defects in the cortex. Future studies focusing on live imaging of cytokinesis and cell fate mapping will help elucidate better these mechanisms further.

      (3) Limited to a singular time point of mouse cortical development

      On page 13, the authors outline the results of their Y2H screen with the identification of three high-confidence interactors. Notably, they used an E10.5-E12.5 mouse brain embryo library rather than one that includes E14, the age of their in-utero electroporation mice. Many of the authors' claims focus on in-utero electroporation of shRNA-Abba of E14 mice that are then evaluated at E16-18. Justification for the focus on this age range should be included to support that their findings can then be applied to all mouse corticogenesis.

      We thank the reviewer to point this out. Indeed, the data suggest that the interaction between ABBA and Nedd9 occurs before E14. The reason to address the questions at E14 is that in earlier work, we have shown that ABBA is mainly expressed through E10.5-12.5 in the floorplate structure formed by radial glia. The radial glia-specific expression was confirmed through double staining with radial glial (RC2) and neuronal (Tuj1) markers at E12.5 (see Saarikangas et al. J. Cell Sci. 121:1444-1454, 2008). Thus, we consider the Y2H library relevant for identifying ABBA's interactors within radial glia. We have specified this better in the corrected manuscript.

      (4) Detail of the effect of the human variant of the ABBA mutation in mice is lacking.

      Their identification of the R671W mutation is interesting and the IUE model warrants more characterization, as they did with their original KD experiments.

      We have now included addition data in the corrected manuscript showing R671W dependent changes in INM (Supplementary Figure 3 )

      Could they show that Abba protein levels are decreased (in either cell lines or electroporated tissue)?

      Estimation of ABBA expression in cell expressing ABBA R671W as in Supplemental Figure 5 did not show significant change.

      -While time-lapse morphology might not have been performed, more analysis on cell division phenotype (such as plane of division and radial glia morphology) would be helpful. 

      This would be indeed very informative, but we were not able to perform these analysis in the existing dataset.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions for targeting some of the weaknesses by additional experiments:

      Regional Demarcation in Radial Glial Cell Population:

      While the authors demonstrate a decrease in overall RFP-positive cells in response to Abba knockdown, the distinction between different regions should be demarcated using cortical layer-specific markers (e.g., CUX1/BRN2 for the upper layer and CTIP2/FOXP2). Quantification based on regional markers would enhance accuracy and meaningful interpretation.

      In order to harmonize the quantification during the different developmental stages we have used a broader definition of the cortical regions that may not be entirely fitting with the regions identified with the staining of Cux1 and CTIP2. We have now however included in the supplementary figure 1 with the staining for Cux1 and CTIP2 showing the corresponding regions defined in the manuscript. Supplementary Figure 1.

      Mitotic Stage Marker and BrdU Staining:<br /> The discrepancy between no changes in staining with the mitotic stage marker PH3 and a reported decrease in Ki67 staining calls for further clarification. Additionally, the use of BrdU staining could distinguish the effects on dividing cells after Abba knockdown. The authors are encouraged to explore these aspects further, including their applicability to NEDD9 knockdown and Abba mutant overexpression.

      As suggested by the reviewer elsewhere, we made use of life imaging. We monitored the distribution of cells showing accumulation of Anillin-GFP in control (Scramble) and ABBA-shRNA3 conditions (this data has been added to the new Supplementary Figure 3). Anillin has been shown to be an efficient tool for monitoring cell cycle stages in vivo (Hesse et al Nature Com. 2012, DOI: 10.1038/ncomms2089). Interestingly, we observed an increase in cells displaying accumulated Anillin in ABBA-shRNA3 treated cells, which is consistent with an arrest of progression of mitosis.  

      Quantification of Cytokinesis Effects:

      The brain slices illustrating the effects of Abba knockdown on cytokinesis would benefit from a quantification depicting changes in interkinetic nuclear migration and the number of successful mitosis events. This would enhance the clarity and interpretation of the observed effects.

      In the revised manuscript we have included new data in Supplementary Figure 3 were we report the quantification of the distance of the RGC from the ventricle to address the reviewer’s comments. We were not entirely sure about comment about quantification of successful mitosis events, but as specified above, we have included new data from the monitoring of anillin. We hope to perform more detailed experiments and analysis in future studies. 

      Loss of Interaction and NEDD9 Localization:

      The manuscript lacks an exploration of the loss or decrease in interaction between Abba and NEDD9 in the case of the pathogenic patient-derived mutation in Abba. Addressing this aspect is crucial, as it may shed light on the underlying causes of the observed effects. Furthermore, investigating changes in NEDD9 localization following overexpression of the Abba mutant would provide additional insights.

      We fully agree with the reviewer’s comment. Unfortunately the anti NEDD9 antibody had a poor performance in slice immunohistochemistry, which hampered further reliable investigation of expression and distribution changes in vivo. Resolving this issue and providing a more detailed characterization of the mechanism of Abba-NEDD9 interaction will be important in future studies.

      Overall, I believe that with minor revisions and additional contextualization, the manuscript has the potential to make a significant contribution to the field. I recommend acceptance pending the incorporation of the suggested revisions.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is generally well-organized. We hope that given their nice experimental systems, many of the comments and questions can be addressed with their data already on hand.

      Minor Comments

      • For Figure 6E A closeup of the vimentin would be helpful - hard to visualize radial glia morphology at the current magnification.

      This has been corrected in the new version of the manuscript

      • For the in utero electroporation what was their rationale for 2-4 day interval before evaluation? For example, waiting for more cortical plate development to be able to manifest long-term effects.

      We observed a massive cell death at E18, in only few of those brains we were able to still observe RFP cells. We have also tried P6 animals but none of them had significant reminding electroporated cells that’s why we have decided to focus at E17, 3 days after the electroporation to have still enough expression of the shRNA.

      • Figure 4E-F lacks images of controls for comparison of effect.

      This has been corrected in the revised version of the manuscript

    1. Author response:

      Reviewer #1:

      The manuscript Xu et al. explores the regulation of the microtubule minus end protein CAMSAP2 localization to the Golgi by the Serine/threonine-protein kinase MARK2 (PAR1, PAR1B). The authors utilize immunofluorescence and biochemical approaches to demonstrate that MARK2 is localized at the Golgi apparatus via its spacer domain. They show that depletion of this protein alters Golgi morphology and diminishes CAMSAP2 localization to the Golgi apparatus. The authors combine mass spectroscopy and immunoprecipitation to show that CAMSAP2 is phosphorylated at S835 by MARK2, and that this phosphorylation regulates localization of CAMSAP2 at Golgi membranes. Further, the authors identify USO1 (p115) as the Golgi resident protein mediating CAMSAP2 recruitment to the Golgi apparatus following S835 phosphorylation. The authors would need to address the following queries to support their conclusions.

      We sincerely thank the reviewer for their valuable time and effort in evaluating our manuscript. We deeply appreciate the constructive feedback and insightful suggestions, which have been instrumental in improving the quality and clarity of our study. We have carefully considered all the comments and have made the necessary revisions to address the concerns raised.

      Major Comments 

      (1) Dynamic localization of CAMSAP2 during Golgi reorientation

      - The authors use fixed wound edges assays and co-localization analysis to describe changes in CAMSAP2 positioning during Golgi reorientation in response to polarizing cues (a free wound edge in this case). In Figure 1C, they present a graphical representation of quantified immunofluorescence images, using color coding to to describe the three states of Golgi reorientation in response to a wound (green, blue, red indicating non-polarised, partial and complete Golgi reorientation, respectively). They then use these 'colour coded' classifications to quantitate CAMSAP2/GM130 co-localization.It is unclear why the authors have not just used representative immunofluorescence images in the main figures. Transparent, color overlays could be placed over the cells in the representative images to indicate which of the three described states each cell is currently exhibiting. However, for clarity, I would recommend changing the color coded 'states' to a descriptor rather than a color. i.e. Figure 1D x axis labels should be 'complete' and 'partial', instead of 'red' and 'blue'. 

      Thank you for this insightful suggestion. We have added representative immunofluorescence images with transparent color overlay to indicate the three Golgi orientation states. These images are included in Supplementary Figure 2B-C, providing a clear visual reference for the quantitative data. Additionally, we have revised the x-axis labels in Figure 1E from "Red" and "Blue" to "Complete" and "Partial" to ensure clarity and consistency with the descriptive terminology in the text. These changes are described in the Results section (page 7, lines 15-19) and the figure legend (page 29, lines 27-29).

      We believe these updates improve the clarity and accessibility of our figures and hope they address the reviewer’s concerns.

      - note- figure 2 F-G, is semi quantitative, why did the authors not just measure Golgi angle using the nucleus and Golgi distribution?

      We appreciate the reviewer’s comment on this point. Following the recommendation, we have performed an additional analysis measuring Golgi orientation angles based on the nucleus-Golgi distribution. This quantitative approach complements our initial semi-quantitative analysis and provides a more precise assessment of Golgi orientation during cell migration.

      The new data have been incorporated into Supplementary Figure 1F-H. These results clearly demonstrate the consistency between the quantitative and semi-quantitative methods, further validating our findings and highlighting the dynamic changes in Golgi orientation during cell migration. These changes are described in the Results section (page 6, lines 24-31).

      - While it is established that the Golgi is dispersed during reorientation in wound edge migration, the Golgi apparatus also becomes dispersed/less condensed prior to cell division. As the authors have used fixed images - how are they sure that the Golgi morphology or CAMSAP2 localization in 'blue cells' are indicative of Golgi reorientation and not division? Live imaging of cells expressing CAMSAP2, and an additional Golgi marker could be used to demonstrate that the described changes in Golgi morphology and CAMSAP2 localization are occurring during the rear-to-front transition of the Golgi.

      Thank you for raising this important question. To address this concern, we carefully examined the nuclear morphology of dispersed Golgi cells and found no evidence of mitotic features, indicating that these cells are not undergoing division (Figure 1A, Supplemental Figure 2A). Furthermore, during the scratch wound assay, we use 2% serum to culture the cells, which helps minimize the impact of cell division. This analysis has been added to the Results section (page7, lines 19-22 in the revised manuscript).

      Additionally, we conducted live-cell imaging, as suggested, using cells expressing a Golgi marker. This approach confirmed that Golgi dispersion occurs transiently during reorientation in cell migration. The new live-cell imaging data have been incorporated into Supplementary Figure 2A, and the corresponding description has been updated in the Results section (page 7, lines 2-5).

      Finally, considering that overexpression of CAMSAP2 can lead to artifactually condensed Golgi structures, we used endogenous staining to observe CAMSAP2 localization at different stages of migration. These observations provide a clearer understanding of CAMSAP2 dynamics during Golgi reorientation and are now presented in revised Figure 1A-B. This information has been described in the Results section (page 7, lines 5-10).

      We hope these additions and clarifications address the reviewer’s concerns. Once again, we are deeply grateful for this constructive feedback, which has greatly improved the robustness of our study.

      (2) MARK2 localization to the Golgi apparatus

      - The authors investigated the positioning of endogenous MARK2 via immunofluorescence staining, and exogenous flag-tagged MARK2 in a KO background. The description of the protocol required to visualize Golgi localization of MARK2 is inconsistent between the results and methods text. The results text reads as through the 2% serum incubation occurs as a blocking step following fixation. Conversely, the methods section describes the 2% serum incubation as occurring just prior to fixation as a form of serum starvation. The authors need to clarify which of these protocols is correct. Further, whilst I can appreciate that the mechanistic understanding of why serum starvation is required for MARK2 Golgi localization is beyond the scope of the current work, the authors should at a minimum speculate in the discussion as to why they think it might occur.

      We sincerely thank the reviewer for the constructive feedback on the localization of MARK2 at the Golgi. Due to the complexity and variability of this phenomenon, we decided to remove the related data from the current manuscript to maintain the rigor of our study. However, we have included a discussion of this phenomenon in the Discussion section (page 13, lines 31-39 and page 14, 1-6in the revised manuscript) and plan to further investigate it in future studies.

      The localization of MARK2 at the Golgi was initially observed in experiments following serum starvation, where cells were fixed and stained (The data is not displayed). This observation was supported by the loss of Golgi localization in MARK2 knockdown cells, indicating the specificity of the antibody (The data is not displayed). However, this phenomenon was not consistently observed across all cells, likely due to its transient nature.We speculate that the localization of MARK2 to the Golgi depends on its activity and post-translational modifications. For example, phosphorylation at T595 has been reported to regulate the translocation of MARK2 from the plasma membrane to the cytoplasm (Hurov et al., 2004). Serum starvation might induce modifications or conformational changes in MARK2, leading to its temporary Golgi localization. Additionally, we hypothesize that this localization may coincide with specific Golgi dynamics, such as the transition from dispersed to ribbon-like structures during cell migration.

      We also acknowledge the inconsistency in the Results and Methods sections regarding serum starvation. We confirm that serum starvation was performed prior to fixation as an experimental condition, rather than as a blocking step in immunostaining. This clarification has been incorporated into the revised Methods section (page 24, lines 11-12).

      We hope this clarification, along with our planned future studies, adequately addresses the reviewer’s concerns. Once again, we deeply appreciate the reviewer’s valuable comments, which have provided important insights for our ongoing work. References:

      Hurov, J.B., Watkins, J.L., and Piwnica-Worms, H. (2004). Atypical PKC phosphorylates PAR-1 kinases to regulate localization and activity. Curr Biol 14 (8): 736-741.

      - The authors should strengthen their findings by using validated tools/methods consistent with previous publications. i.e. Waterman lab has published two MARK2 constructs- Apple and eGFP tagged versions (doi.org/10.1016/j.cub.2022.04.088), and the localization of MARK2 in U2Os cells (using the same antibody (Anti- MARK2 C-terminal, ABCAM Cat# ab136872). The authors should (1) image the cells live using eGFP-tagged MARK2 during serum starvation to show the dynamics of this localization, (2) image U2Os cells using the abcam ab136872 antibody +/- 2% serum starve. Two MARK2 antibodies are listed in Table 2. Does abcam (ab133724) show a similar localisation?

      - The Golgi localization of MARK2 occurs in the absence of the T structural domain, but not when full length MARK2 is expressed. The authors conclude the T- domain is likely inhibitory. When combined with the requirement for serum starvation for this interaction to occur, the authors should clarify the physiological relevance of these observations.

      We sincerely thank the reviewer for their valuable suggestions regarding the use of tools and methods and the physiological relevance of MARK2 localization to the Golgi. Regarding the question of how MARK2 itself localizes to the Golgi, we are currently unable to fully elucidate the underlying mechanism. Therefore, we have removed the discussion of MARK2’s Golgi localization from the manuscript to ensure scientific accuracy. However, Below, we provide our detailed response as soon as possible:

      First, regarding the suggestion to use tools and methods developed by the Waterman lab to strengthen our findings, we have carefully evaluated their applicability. In our live-cell imaging experiments, we found that full-length MARK2 does not stably localize to the Golgi, even under serum starvation conditions. However, truncated MARK2 mutants lacking the Tail (T) domain exhibit robust Golgi localization. Furthermore, our immunofluorescence staining results indicate that the Spacer domain is the minimal region required for MARK2 localization at the Golgi. Based on these findings, we believe that live-cell imaging of EGFP-tagged full-length MARK2 may not effectively reveal the dynamics of its Golgi localization. However, we plan to focus on the truncated constructs in future studies to better explore the mechanisms underlying MARK2's dynamic behavior. 

      Regarding the use of the ab136872 antibody to stain U2OS cells with and without serum starvation, we note that the protocol described by the Waterman lab involves pre-fixation and permeabilization steps, which are not compatible with live-cell imaging. Additionally, we observed that MARK2 Golgi localization appears to be condition-dependent and may coincide with specific Golgi dynamics, such as transitions from dispersed stacks to intact ribbon structures. These events are likely brief and challenging to capture consistently. Nevertheless, we recognize the value of this experimental design and plan to adapt the staining conditions in future work to validate our results further. As for the ab133724 antibody listed in Table 2, we clarify that it has only been validated for Western blotting in our study and does not yield reliable results in immunofluorescence experiments. For this reason, all immunofluorescence staining in this study relied exclusively on ab136872. This distinction has been clarified in the revised Table 2 .

      Regarding the hypothesis that the Tail domain of MARK2 is inhibitory, our observations showed that truncated MARK2 mutants lacking the T domain stably localized to the Golgi, whereas fulllength MARK2 did not. Literature evidence supports this hypothesis, as studies on the yeast homolog Kin2 indicate that the C-terminal region (including the Tail domain) binds to the Nterminal catalytic domain to inhibit kinase activity (Elbert et al., 2005). We speculate that serum starvation disrupts this intramolecular interaction, relieving the inhibition by the T domain, activating MARK2, and promoting its localization to the Golgi. Moreover, we hypothesize that the transient nature of MARK2 localization to the Golgi may be related to specific Golgi remodeling processes, such as the transition from dispersed stacks to intact ribbon structures during cell migration or polarity establishment. 

      References:

      Elbert, M., Rossi, G., and Brennwald, P. (2005). The yeast par-1 homologs kin1 and kin2 show genetic and physical interactions with components of the exocytic machinery. Mol Biol Cell 16 (2): 532-549.

      (3) Phosphorylation of CAMSAP2 by MARK2

      - The authors examined the effects of MARK2 phosphorylation of CAMSAP2 on Golgi architecture through expression of WT-CAMSAP2 and two CAMSAP2 S835 mutants in CAMSAP2 KO cells. They find that CAMSAP2 S835A (non-phosphorylatable) was less capable of rescuing Golgi morphology than CAMSAP2 S835D (phosphomimetic). Golgi area has been measured to demonstrate this phenomenon. Representative immunofluorescence images in Fig. 4D appear to indicate that this is the case. However, quantification in Fig. 4E does not show significance between HA-CAMSAP2 and HA-CAMSAP2A that would support the initial claim. The authors could analyze other aspects of Golgi morphology (e.g. number of Golgi fragments, degree of dispersal around the nucleus) to capture the clear structural defects demonstrated in HACAMSAP2A cells.

      We sincerely thank the reviewer for their valuable feedback and for pointing out potential areas of improvement in our analysis of Golgi morphology. We apologize for any misunderstanding caused by our description of the results in Figure 4E.

      The quantification indeed shows a significant difference between HA-CAMSAP2 and HACAMSAP2A in terms of Golgi area, as indicated in the figure by the statistical annotations (pvalue provided in the legend). To ensure clarity, we have revised the figure legend (page 32, lines 19-23 in the revised manuscript) to explicitly describe the statistical significance, and the method used for quantification.

      Because the quantification indeed shows a significant difference between HA-CAMSAP2 and HA-CAMSAP2A in terms of Golgi area, and to maintain consistency throughout the manuscript, we did not further analyze other aspects of Golgi morphology.

      We hope this clarification, along with the additional analyses, will address the reviewer’s concerns. Once again, we are deeply grateful for these constructive comments, which have helped us improve the quality and robustness of our study.

      - Wound edge assays are used to capture the difference in Golgi reorientation towards the leading edge between CAMSAP2 S835A and CAMSAP2 S835D. However, these studies lack comparison to WT-CAMSAP2 that would support the role of phosphorylated CAMSAP2 in reorienting the Golgi in this context.

      We sincerely thank the reviewer for their insightful suggestion. In response, we have added a comparison between CAMSAP2 S835A/D and WT-CAMSAP2, in addition to HT1080 and MARK2 KO cells, to better evaluate the role of phosphorylated CAMSAP2 in Golgi reorientation.

      The results, now shown in Figure 5A-C, indicate that in the absence of MARK2, there is no significant difference in Golgi reorientation between WT-CAMSAP2 and CAMSAP2 S835A. This observation supports the conclusion that MARK2-mediated phosphorylation of CAMSAP2 at S835 is essential for effective Golgi reorientation.

      To enhance clarity, we have updated the corresponding Results section (page 9, lines 37-40 and page 10, line 1 in the revised manuscript) to describe this additional comparison. We believe this analysis strengthens our findings and provides a clearer understanding of the role of phosphorylated CAMSAP2 in Golgi dynamics.

      We hope this additional data addresses the reviewer’s concerns. Once again, we are grateful for the constructive feedback, which has helped improve the clarity and robustness of our study.

      (4) Identification of CAMSAP2 interaction partners

      - Quantification of interaction ability between CAMSAP2 and CG-NAP, CLASP2, or USO1 in Fig. 5D, 5F and 5J respectively, lack WT-CAMSAP2 comparisons.

      We sincerely thank the reviewer for their valuable suggestion. In response, we have included WT-CAMSAP2 data in the quantification of interaction ability between CAMSAP2 and CG-NAP, CLASP2, and USO1. These results, now shown in revised Figures 5 D-G and Figures 6 C-D, provide a direct comparison that further validates the differential interaction abilities of CAMSAP2 mutants.

      The inclusion of WT-CAMSAP2 allows us to better contextualize the effects of specific mutations on CAMSAP2 interactions and strengthens our conclusions regarding the role of these interactions in Golgi dynamics.

      We hope this addition addresses the reviewer’s concerns and enhances the clarity and robustness of our study. We deeply appreciate the constructive feedback, which has been instrumental in improving our manuscript.

      - The CG-NAP immunoblot presented in Fig. 5C shows that the protein is 310 kDa, which is the incorrect molecular weight. CG-NAP (AKAP450) should appear at around 450 kDa. Further, no CG-NAP antibody is included in Table 2 - Information of Antibodies. The authors need to explain this discrepancy.

      We sincerely apologize for the lack of clarity in our annotation and description, which may have caused confusion regarding the CG-NAP immunoblot presented in Figure 5C (Figure 5D in the revised manuscript). To clarify, CG-NAP (AKAP450) is indeed a 450 kDa protein, and the marker at 310 kDa represents the molecular weight marker’s upper limit, above which CG-NAP is observed. This has been clarified in the figure legend (page 33, lines 21-23 in the revised manuscript).

      Regarding the CG-NAP antibody, it was custom-made and purified in our laboratory. Polyclonal antisera against CG-NAP, designated as αEE, were generated by immunizing rabbits with GSTfused fragments of CG-NAP (aa 423–542). This antibody has been validated extensively in our previous research, demonstrating its specificity and reliability (Wang et al., 2017). The details of the antibody preparation are included in the footnote of Table 2 for reference.

      We hope this clarification, along with the additional context regarding the antibody validation, resolves the reviewer’s concerns. We are deeply grateful for the reviewer’s attention to detail, which has helped us improve the clarity and rigor of our manuscript.

      References:

      Wang, J., Xu, H., Jiang, Y., Takahashi, M., Takeichi, M., and Meng, W. (2017). CAMSAP3dependent microtubule dynamics regulates Golgi assembly in epithelial cells. Journal of genetics and genomics = Yi chuan xue bao 44 (1): 39-49.

      Minor Comments

      - Authors should change immunofluorescence images to colorblind friendly colors. The current presentation of merged overlays makes it really difficult to interpret- I would strongly encourage inverted or at a minimum greyscale individual images of key proteins of interest.

      We sincerely thank the reviewer for their valuable suggestion regarding the presentation of immunofluorescence images. In response, we have converted the images in Figure 1C to greyscale individual images for each key protein of interest. This adjustment ensures that the figures are more accessible and interpretable, including for readers with color vision deficiencies.

      We hope this modification addresses the reviewer’s concern and improves the clarity of our data presentation. We are grateful for the constructive feedback, which has helped us enhance the overall quality of our figures.

      - On p. 8 text should be amended to 'Previous literature has documented MARK2's localization to the microtubules, microtubule-organizing center (MTOC), focal adhesions..'

      We sincerely thank the reviewer for their comment regarding the text on page 8. Considering the reasoning provided in response to question 2, where we clarified that MARK2's Golgi localization is not fully understood, we have decided to remove this section from the manuscript to maintain the accuracy and rigor of our study.

      We appreciate the reviewer’s attention to detail and constructive feedback, which has helped us improve the clarity and focus of our manuscript. 

      - In Fig.1A scale bars are not shown on individual channel images of CAMSAP or GM130

      We sincerely thank the reviewer for pointing out the omission of scale bars in the individual channel images of CAMSAP and GM130 in Figure 1A (Figure 1C in the revised manuscript). In response, we have added a scale bar (5 μm) to the CAMSAP2 channel, as shown in the revised Figure 1C. These updates have been described in the figure legend (page 29, line 21).

      We hope this modification addresses the reviewer’s concern and improves the accuracy and clarity of our figure presentation. We greatly appreciate the reviewer’s constructive feedback, which has helped enhance the quality of our manuscript.

      - In Fig. 1B the title should be amended to 'Colocalization of CAMSAP2/GM130'

      We sincerely thank the reviewer for their suggestion to amend the title in Figure 1B (Figure 1D in the revised manuscript). In response, we have updated the title to "Colocalization of CAMSAP2/GM130," as shown in the revised Figure 1D.

      We hope this modification addresses the reviewer’s concern and improves the clarity and accuracy of the figure. We greatly appreciate the reviewer’s valuable feedback, which has helped us refine the presentation of our results.

      - In Fig. 2F, 5A, and Sup Fig 3C scale bars have been presented vertically

      We sincerely thank the reviewer for pointing out the issue with the vertical orientation of scale bars in Figures 2F (Figure 2D in the revised manuscript), 5A, and Supplementary Figure 3C. In response, we have modified the scale bars in revised Figures 2D and 5A to a horizontal orientation for improved consistency and clarity. Additionally, Supplementary Figure 3C has been removed from the revised manuscript.

      We hope these adjustments address the reviewer’s concerns and enhance the overall presentation quality of the figures. We greatly appreciate the reviewer’s constructive feedback, which has helped us refine our manuscript.

      - Panels are not correctly aligned, and images are not evenly spaced or sized in multiple figures - Fig. 2F, 4D, Sup Fig. 1F, Sup Fig. 2C, Sup Fig. 3E, Sup Fig. 4C

      We sincerely thank the reviewer for pointing out the misalignment and uneven spacing or sizing of panels in multiple figures, including Figures 2F, 4D, Supplementary Figures 1F, 2C, 3E, and 4C (Figure 2D, 4D, Supplementary Figures 1F, 2C, and 3H in the revised manuscript.

      Supplementary Figure 3E was removed from our manuscript). In response, we have standardized the spacing and sizing of all panels throughout the manuscript to ensure consistency and improve visual clarity.

      We hope this modification addresses the reviewer’s concerns and enhances the overall presentation quality of our figures. We greatly appreciate the reviewer’s constructive feedback, which has helped us improve the organization and professionalism of our manuscript.

      - An uncolored additional data point is present in Fig. 3F

      We sincerely thank the reviewer for pointing out the presence of an uncolored additional data point in Figure 3F. In response, we have removed this data point from the revised figure to ensure accuracy and clarity.

      We hope this adjustment resolves the reviewer’s concern and improves the overall quality of the figure. We greatly appreciate the reviewer’s careful review and constructive feedback, which have helped us refine our manuscript.

      - In Fig. 3A 'GAMSAP2/GM130' in the vertical axis label should be amended to 'CAMSAP2/GM130'

      We sincerely thank the reviewer for pointing out the error in the vertical axis label of Figure 3A. In response, we have corrected "GAMSAP2/GM130" to "CAMSAP2/GM130," as shown in the revised Figure 3I.

      We hope this correction resolves the reviewer’s concern and improves the accuracy of our figure. We greatly appreciate the reviewer’s careful review and constructive feedback, which have helped us refine our manuscript.

      - In Fig 5A the green label should be amended to 'GFP-CAMSAP2' instead of 'GFP'

      We sincerely apologize for the confusion caused by our labeling in Figure 5A. To clarify, the green label “GFP” refers to the antibody used, while “GFP-CAMSAP2” is indicated at the top of the figure to specify the construct being analyzed.

      We hope this explanation resolves the misunderstanding and provides clarity regarding the labeling in Figure 5A. We greatly appreciate the reviewer’s feedback, which has allowed us to address this issue and improve the precision of our figure annotations.

      - The repeated use of contractions throughout the manuscript was distracting, I would strongly encourage removing these.

      We sincerely thank the reviewer for pointing out the distracting use of contractions in the manuscript. In response, we have removed and replaced all contractions with their full forms to improve the clarity and formal tone of the text.

      We hope this modification addresses the reviewer’s concern and enhances the readability and professionalism of our manuscript. We greatly appreciate the reviewer’s constructive feedback, which has helped us refine the quality of our writing.

      Reviewer #2: 

      Summary  

      This work by the Meng lab investigates the role of the proteins MARK2 and CAMSAP2 in the Golgi reorientation during cell polarisation and migration. They identified that both proteins interact together and that MARK2 phosphorylates CAMSAP2 on the residue S835. They show that the phosphorylation affects the localisation of CAMSAP2 at the Golgi apparatus and in turn influences the Golgi structure itself. Using the TurboID experimental approach, the author identified the USO1 protein as a protein that binds differentially to CAMSAP2 when it is itself phosphorylated at residue 835. Dissecting the molecular mechanisms controlling Golgi polarisation during cell migration is a highly complex but fundamental issue in cell biology and the author may have identified one important key step in this process. However, although the authors have made a genuine iconographic effort to help the reader understand their point of view, the data presented in this study appear sometimes fragile, lacking rigour in the analysis or over-interpreted. Additional analyses need to be conducted to strengthen this study and elevate it to the level it deserves.

      We sincerely thank the reviewer for their thoughtful evaluation and recognition of our study's significance in understanding Golgi reorientation during cell migration. We appreciate the constructive feedback regarding data robustness, clarity, and interpretation. In response, we have conducted additional analyses, revised data presentation, and ensured cautious interpretation throughout the manuscript. These changes aim to address the reviewer’s concerns comprehensively and strengthen the scientific rigor of our study.

      Major comments

      In order to conclude as they do about the putative role of USO1, the authors need to perform a siRNA/CRISPR of USO1 to validate its role in anchoring CAMSAP2 to the Golgi apparatus in a MARK2 phosphorylation-dependent manner. In other words, does depletion of USO1 affect the recruitment of CAMSAP2 to the Golgi apparatus?

      We sincerely thank the reviewer for their insightful suggestion regarding the role of USO1 in anchoring CAMSAP2 to the Golgi apparatus. In response, we performed USO1 knockdown using siRNA and quantified the Pearson correlation coefficient of CAMSAP2 and GM130 colocalization in control and USO1-knockdown cells.

      The results show that CAMSAP2 localization to the Golgi is significantly reduced in USO1knockdown cells, confirming that USO1 plays a critical role in recruiting CAMSAP2 to the Golgi apparatus. These results are now presented in Figures 6 E–G, and corresponding updates have been incorporated into the Results section (page 10, lines 36-37 in the revised manuscript).

      We hope this additional experiment addresses the reviewer’s concern and strengthens our conclusions regarding the role of USO1. We are grateful for the reviewer’s constructive feedback, which has greatly improved the robustness of our study.  

      It is not clear from this study exactly when and where MARK2 phosphorylates CAMSAP2. What is the result of overexpression of the two proteins in their respective localisation to the Golgi apparatus? As binding between CAMSAP2 and MARK2 appears robust in the immunoprecipitation assay, this should be readily investigated. 

      We sincerely thank the reviewer for their insightful comments and questions. To address the role of MARK2 in regulating CAMSAP2 localization to the Golgi apparatus, we overexpressed GFPMARK2 in cells and compared its effects on CAMSAP2 localization to the Golgi with control cells overexpressing GFP alone. Our results show that CAMSAP2 localization to the Golgi is significantly increased in GFP-MARK2-overexpressing cells, as shown in Supplementary Figures 3C and 3E. Corresponding updates have been incorporated into the Results section (page 8, lines 25-27 in the revised manuscript).

      Regarding the question of how MARK2 itself localizes to the Golgi, we are currently unable to fully elucidate the underlying mechanism. Therefore, we have removed the discussion of MARK2’s Golgi localization from the manuscript to ensure scientific accuracy. Consequently, we have not conducted experiments to assess the effects of CAMSAP2 overexpression on MARK2’s localization to the Golgi.

      We hope this explanation clarifies the reviewer’s concerns. We are grateful for the reviewer’s constructive feedback, which has guided us in improving the clarity and focus of our study.

      To strengthen their results, can the author map the interaction domains between CAMSAP2 and MARK2? The authors have at their disposal all the constructs necessary for this dissection.

      We sincerely thank the reviewer for their insightful suggestion to map the interaction domains between CAMSAP2 and MARK2. In response, we performed immunoprecipitation experiments using truncated constructs of CAMSAP2. Our results reveal that MARK2 interacts specifically with the C-terminus (1149F) of CAMSAP2, as shown in Supplementary Figures 3A and 3B. Corresponding updates have been incorporated into the Results section (page 7, lines 41-42 and page 8, line 1 in the revised manuscript).

      We hope this additional analysis addresses the reviewer’s suggestion and further strengthens our conclusions. We greatly appreciate the reviewer’s constructive feedback, which has helped improve the depth of our study.

      Minor comments

      Sup-fig1  

      H: It is not clear if the polarisation experiment has been repeated three times (as it should) and pooled or is just the result of one experiment?

      We sincerely apologize for the lack of clarity regarding the experimental details for Supplementary Figure 1H. To clarify, the polarization experiment was repeated three times, and the results were pooled to generate the data presented. We have updated the figure legend for Supplementary Figure 1H to explicitly state this information (page 35, lines 27-29 in the revised manuscript).

      We hope this clarification resolves the reviewer’s concern. We greatly appreciate the reviewer’s careful review and constructive feedback, which have helped us improve the accuracy and transparency of our manuscript.

      Sup-fig2  

      C: "Immunofluorescence staining plots" formula used in the legend is not clear. Which condition is presented in the panel, parental HT1080 or CAMSAP2 KO cells?  

      We thank the reviewer for pointing out the lack of clarity regarding the conditions presented in Supplementary Figure 2C. To clarify, the immunofluorescence staining plots shown in this panel are from parental HT1080 cells. We have updated the figure legend to include this information (page 36, line 14 in the revised manuscript).

      We hope this clarification resolves the reviewer’s concern and improves the transparency of our data presentation. We greatly appreciate the reviewer’s feedback, which has helped us refine the manuscript.

      Figure 1  

      D: In the plot, the colour of the points for the "red cells" are red but the one for the "blue cells" are green, this is confusing.

      E: Once again, the colour choice is confusing as blue cells (t=0.5h) are quantified using red dots and red cells (t=2h) quantified using green dots. The t=0h condition should be quantified as well and added to the graph.  

      F: Representative CAMSAP2 immunofluorescence pictures for the three time points should be provided in addition to the drawings.  

      We thank the reviewer for their valuable comments regarding Figure 1D (revised Figure 1E), Figure 1E (revised Figure 1B), and Figure 1F (revised Supplementary Figure 2C).

      - Figure 1D (revised Figure 1E): we have modified the x-axis labels and adjusted the color scheme of the data points to ensure consistency and avoid confusion.

      - Figure 1E (revised Figure 1B): we have updated the x-axis and included the quantification of the t=0h condition, which has been added to the graph.

      - Figure 1F (revised Supplementary Figure 2C): we have provided representative immunofluorescence images of CAMSAP2 for the three-time points to complement the schematic drawings.

      We hope these revisions address the reviewer’s concerns and improve the clarity and completeness of our data presentation. We greatly appreciate the reviewer’s constructive feedback, which has significantly contributed to enhancing our manuscript.

      Figure 2  

      A: No methodology in the material and methods is provided for this analysis.  

      B: Can the authors be more precise regarding the source of the CAMSAP2 interactants? Can the author provide the citation of the publication describing the CAMSAP2-MARK2 interaction?  

      D: Genotyping for the MARK2 KO cell line should be provided the same way it was provided for the CAMSAP2 cell line in Sup-fig1. "MARK2 was enriched around the Golgi apparatus in a  significant proportion of HT1080 cells": which proportion of the cells?  

      F: The time point of fixation is missing  

      G: It is not clear if the polarisation experiment has been repeated three times (as it should) and pooled or is just the result of one experiment?  

      We thank the reviewer for their detailed comments and suggestions regarding Figure 2. Below, we provide clarifications and outline the modifications made:

      - Figure 2A: The methodology for this analysis has been added to section 5.14 (Data statistics). Specifically, we have stated: “GO analysis of proteins was plotted using https://www.bioinformatics.com.cn, an online platform for data analysis and visualization” (page 26 lines 5-6 in the revised manuscript).

      - Figure 2B: The CAMSAP2 interactants were derived from the study by Wu et al., 2016, which provides the source of these interactants. The interaction between CAMSAP2 and MARK2 is referenced from Zhou et al., 2020. These citations have been added to the relevant sections of the manuscript (page 30, lines 10-11 and 13-14).

      - Figure 2D (removed in the revised manuscript): Genotyping for the MARK2 KO cell line has been provided in the same format as for the CAMSAP2 KO cell line in Figure 2G. Additionally, as the MARK2 Golgi localization discussion cannot yet be fully elucidated, we have removed this portion from the manuscript.

      - Figure 2F (revised Figure 2D): The time point of fixation, which occurred 2 hours after the scratch wound assay, has been added to the figure legend (page 30, lines 15-16).

      - Figure 2G (revised Figure 2E-F): The polarization experiment was repeated three times, and the results were pooled. This information has been included in the figure legend (page 30, lines 26 and 29).

      We hope these updates address the reviewer’s concerns and improve the clarity and completeness of the manuscript. We are grateful for the reviewer’s constructive feedback, which has greatly enhanced the rigor of our study. References:

      Wu, J., de Heus, C., Liu, Q., Bouchet, B.P., Noordstra, I., Jiang, K., Hua, S., Martin, M., Yang, C., Grigoriev, I., et al. (2016). Molecular Pathway of Microtubule Organization at the Golgi Apparatus. Dev Cell 39 (1): 44-60.

      Sup-fig3  

      E: Although colocalisation between CAMSAP2 and MARK2 is clear in your serum conditions in HT1080 and RPE1 cells, the deletion domain analysis appears weak and insufficient to implicate the role of the spacer domain. This part should be deleted or strengthened, but the data do not satisfactorily support your conclusion as it stands.  

      We sincerely thank the reviewer for their critical comments regarding the deletion domain analysis of MARK2 and its role in colocalization with CAMSAP2. As the current data do not satisfactorily support our conclusions, we have removed all related content on MARK2 and the deletion domain analysis from the manuscript to maintain scientific rigor.

      We appreciate the reviewer’s valuable feedback, which has helped us refine and improve the quality and focus of our study.

      Figure 3  

      A: Can the reduced CAMSAP2 Golgi localisation phenotype be rescued by the overexpression of MARK2 cDNA in the MARK2 KO cells?  

      F: Presence of a white dot on the HT1080 plot  

      G: The composition of the homogenization buffer is not indicated in the material and methods  

      We thank the reviewer for their valuable comments and suggestions regarding Figure 3. Below, we detail the modifications made:

      - Figure 3A: To address whether the reduced CAMSAP2 Golgi localization phenotype can be rescued, we overexpressed MARK2 cDNA in MARK2 KO cells. Our results show that overexpression of MARK2 successfully rescues the reduced CAMSAP2 localization to the Golgi, as demonstrated in Supplementary Figures 3C and 3E (page 8, lines 5-7).

      - Figure 3F: We have removed the white dot on the HT1080 plot to ensure clarity and accuracy.

      - Figure 3G: The composition of the homogenization buffer used in the experiment has been added to the Materials and Methods section for completeness (page 24, lines 34-41 and page 25, lines 1-10).

      We hope these revisions address the reviewer’s concerns and enhance the clarity and rigor of our study. We are grateful for the reviewer’s constructive feedback, which has significantly improved the quality of our manuscript.

      Figure 4  

      B: Quantification of the effect of the S835A mutation should be provided  

      D: Top left panel: Why Ha antibody stains Golgi structure in absence of Ha-CAMSAP2 transfection ? IF the Ha antibody has unspecific affinity towards the Golgi apparatus, may be it is not the good tag to use in this assay?  

      E: The number of cells studied should be standardized. 119 cells were analyzed in the CAMSAP KO vs only 35 cells in the CAMSAP2 KO (HA-CAMSAP2-S835D) conditions. This could introduce strong bias to the analysis. Furthermore the CAMSAP2 S835A seems to provide a certain level of rescue. It would be interesting to see what is the result of the T test between the HT1080 and HA-CAMSAP S835A conditions.  

      We thank the reviewer for their thoughtful comments and suggestions regarding Figure 4. Below, we detail the revisions and clarifications made:

      - Figure 4B: The S835A mutation renders CAMSAP2 non-phosphorylatable by MARK2. This conclusion is based on our experimental observations and previously reported mechanisms.

      - Figure 4D: The HA antibody does not exhibit non-specific affinity toward the Golgi apparatus. The observed labeling in the top left panel was due to an error in our annotation. We have corrected the label, replacing "HA" with "CAMSAP2" to accurately reflect the experimental conditions.

      - Figure 4E: To standardize the number of cells analyzed across conditions, we reduced the number of CAMSAP2 KO cells analyzed to 50 and balanced the sample sizes for comparison. Additionally, we performed a t-test between the HT1080 and HACAMSAP2 S835A conditions. The results support that CAMSAP2 S835A provides partial rescue, as reflected in the updated analysis (page 32, lines 19-23).

      We hope these revisions address the reviewer’s concerns and improve the accuracy and reliability of our results. We greatly appreciate the reviewer’s constructive feedback, which has significantly enhanced the quality of our study.

      Figure 6  

      6A: The wound position should be indicated on the picture.  

      6B: Given that microtubule labelling is present on the vast majority of the cell surface, this type of quantification provides very little information using conventional light microscopy and should not be used to conclude any change in the microtubule network using Pearson's coefficient.  The text describing the figure 6A and 6B needs re written as I do not understand what the author want to say. "In cells located before the wound edge..." : I do not understand how a cell could be located before the wound edge. Which figure corresponds to the trailing edge of the wounding?

      We thank the reviewer for their valuable comments on Figure 6A (revised Supplementary Figure 6E) and Figure 6B (revised Supplementary Figure 6F). Below, we detail the modifications made:

      - Figure 6A (revised Supplementary Figure 6E), we have added arrows to indicate the wound position, providing clearer guidance for interpreting the image.

      - Figure 6B (revised Supplementary Figure 6F), we revised our quantification method based on the approach used in literature (Wu et al., 2016). Specifically, we analyzed the relationship between microtubules and the Golgi apparatus in cells at the leading edge of the wound. The x-axis represents the distance from the Golgi center, while the y-axis shows the normalized radial fluorescence intensity of microtubules and the Golgi apparatus.

      Additionally, we revised the accompanying text for clarity and accuracy. The original description:

      “In cells located before the wound edge, the Golgi apparatus maintained a ribbon-like shape, with a higher density of microtubules. In contrast, at the trailing edge of the wounding, the Golgi apparatus appeared more as stacks around the nucleus, with fewer microtubules”  was replaced with:

      “Finally, to comprehensively understand the dynamics between non-centrosomal microtubules and the Golgi apparatus during Golgi reorientation, we conducted cell wound-healing experiments (Supplementary Figure 6 E-F). Our observations revealed notable changes in the Golgi apparatus and microtubule network distribution in relation to the wounding. These findings corroborate our earlier results and suggest a highly dynamic interaction between the Golgi apparatus and microtubules during Golgi reorientation” (Revised manuscript page 11 lines 3-10).

      We hope these changes address the reviewer’s concerns and improve the clarity and robustness of our study. We greatly appreciate the reviewer’s constructive feedback, which has significantly enhanced the presentation and interpretation of our data. References:

      Wu, J., de Heus, C., Liu, Q., Bouchet, B.P., Noordstra, I., Jiang, K., Hua, S., Martin, M., Yang, C., Grigoriev, I., et al. (2016). Molecular Pathway of Microtubule Organization at the Golgi Apparatus. Dev Cell 39 (1): 44-60.

      Reviewer #3:  

      Summary  

      In this study, Xu et al. analyzed the wound healing process of HT1080 cells to elucidate the molecular mechanisms by which the Golgi apparatus exhibits transient dispersion before reorienting to the wound edge in the compact assembly structure. They focused on the role of the microtubule minus-end binding protein CAMSAP2, which mediates the linkage between microtubules and the Golgi membrane. At first, they noticed that CAMSAP2 transiently lost Golgi colocalization during the initial phase of the wound healing process. They further found that the cell polarity-regulating kinase MARK2 binds and phosphorylates S835 of CAMSAP2, thereby enhancing the interaction between CAMSAP2 and the Golgi protein Uso1. Together with the phenotypes of CAMSAP2, MARK2, and Uso1 KO cells, these authors argue that the MARK2dependent phosphorylation of CAMSAP2 plays an important role in the reassembly and reorientation of the Golgi apparatus after a transient dispersion observed during the wound healing process.

      We sincerely thank the reviewer for their thoughtful summary of our study and constructive feedback. Your comments have been invaluable in refining our research and enhancing the clarity and impact of our manuscript.

      Major comments

      (1) The premise of this study was that during the wound healing process, the Golgi apparatus exhibits transient dispersion before reorientation to the front of the nucleus.  

      In the first place, this claim has not been well established in previous studies or this paper. Therefore, the authors should present a proof of this claim in a clearer manner.  

      To introduce this cellular event, the authors cite several papers in the introduction (page 4) and the results (page 6) sections. However, many papers cited are review articles, and some of them do not describe this change in the Golgi assembly structure before reorientation. Only two original articles discussed this phenomenon (Bisel et al. 2008 and Wu et al. 2016), and direct evidence was provided by only one paper (Wu et al. 2016) in which changes in the Golgi apparatus in wound-healing RPE1 cells were recorded by live imaging (Fig.7A in Wu et al. 2016).

      Furthermore, it should be noted that this previous paper demonstrated that depletion of CAMSAP2 inhibits Golgi dispersion. Obviously, this conclusion is inconsistent with their statement to introduce this study (page4) that ‟This emphasizes CAMSAP2's role in sustaining Golgi integrity during critical cellular events like migration." In addition, it also contradicts the authors' model of the present paper (Fig. 6E), which argued that disruption of the Golgi association of CAMSAP2 facilitates the Golgi dispersion.  

      We sincerely thank the reviewer for their detailed comments and for providing us with the opportunity to clarify the premise and conclusions of our study. Below, we address the main concerns raised:

      First, to provide direct evidence of Golgi apparatus changes during the wound-healing process, we conducted live-cell imaging experiments. Our observations, presented in revised Supplementary Figure 2A, clearly demonstrate that the Golgi apparatus exhibits a transient dispersion state before reorienting toward the leading edge of the nucleus during migration.

      Regarding the interpretation of previous studies, we acknowledge the reviewer’s concerns about the citation of review articles. To address this, we have revisited the literature and clarified that the phenomenon of Golgi dispersion during reorientation has been directly demonstrated in Wu et al (Wu et al., 2016), where live imaging of wound-healing RPE1 cells showed this dynamic behavior. Furthermore, we note that in Wu et al paper explicitly demonstrates that CAMSAP2 depletion promotes Golgi dispersion, contrary to the reviewer’s interpretation that "depletion of CAMSAP2 inhibits Golgi dispersion."

      Our model focuses on the role of CAMSAP2 in restoring the Golgi from a transiently dispersed structure back to an intact ribbon-like structure during reorientation. Specifically, we propose that during this process, the disruption of CAMSAP2’s association with the Golgi affects this restoration, rather than directly promoting Golgi dispersion as suggested by the reviewer. We believe this distinction aligns with our data and the existing literature.

      To strengthen the background of our study, we have revised the introduction and results sections (page 6, lines 6-13 and page 7, lines 1-17) to minimize reliance on review articles and have provided more explicit citations to original research papers. We hope this addresses the reviewer’s concern about the sufficiency of the cited literature.

      We trust these clarifications and revisions resolve the reviewer’s concerns and enhance the robustness of our study. Once again, we are grateful for the reviewer’s constructive feedback, which has greatly helped refine our manuscript. References:

      Wu, J., de Heus, C., Liu, Q., Bouchet, B.P., Noordstra, I., Jiang, K., Hua, S., Martin, M., Yang, C., Grigoriev, I., et al. (2016). Molecular Pathway of Microtubule Organization at the Golgi Apparatus. Dev Cell 39 (1): 44-60.

      The authors did not provide experimental data for this temporal change in the Golgi assembly structures during the wound-healing process of HT1080 that they analyzed. They only provide an illustration of wound-healing cells (Fig.1F), in which cells are qualitatively discriminated and colored based on the Golgi states, without indicating the experimental basis of the discrimination.

      According to their ambiguous descriptions in the text (page7), the reader can speculate that Fig. 1F is illustrated based on the images in Supplementary Fig. 2C. However, because of the low quality and presentation style of these data, it is impossible to recognize the assembly structures of the Golgi apparatus in wound-edge cells.  

      If the authors hope to establish this premise claim for the present paper, they should provide their own data corresponding to the present Supplementary Fig. 2C in more clarity and present qualitative data verifying this claim, as Wu et al. did in Fig. 7A in their paper.

      We sincerely thank the reviewer for their constructive feedback and the opportunity to address the concern regarding the lack of experimental data supporting the temporal changes in Golgi assembly during the wound-healing process.

      To establish this premise, we conducted live-cell imaging experiments to observe the dynamic changes in the Golgi apparatus during directed cell migration. Our data, now presented in Supplementary Figure 2A, clearly demonstrate that the Golgi apparatus undergoes a transient dispersed state before reorganizing into an intact structure. These findings provide direct experimental evidence supporting our claim.

      In addition, we have revised the data originally presented in Supplementary Figure 2C and enhanced its quality and presentation style. This supplementary figure now includes clearer images and annotations to better illustrate the Golgi assembly structures in wound-edge cells. The improved data presentation aligns with the standards set by Wu et al reported (Wu et al., 2016) and provides qualitative support for our observations.

      We hope these additions and revisions address the reviewer’s concerns and strengthen the scientific rigor and clarity of our manuscript. We are grateful for the reviewer’s valuable suggestions, which have significantly improved the quality of our study. References:

      Wu, J., de Heus, C., Liu, Q., Bouchet, B.P., Noordstra, I., Jiang, K., Hua, S., Martin, M., Yang, C., Grigoriev, I., et al. (2016). Molecular Pathway of Microtubule Organization at the Golgi Apparatus. Dev Cell 39 (1): 44-60.

      (2) In Fig.1A-D, the authors claim that CAMSAP2 dissociates from the Golgi apparatus in cells "that have not yet completed Golgi reorientation and exhibit a transitional Golgi structure, characterized by relative dispersion and loss of polarity (page7)." However, I these analyses, they do not analyze the initial stage (0.5h after wound addition) of cells facing the wound edge, as they do in Supplementary Fig. 2C. Instead, they analyze cells separated from the wound edge at 2 h after wound addition when the wound-edge cells complete their polarization. These data are highly misleading because there is no evidence that the cells separated from the wound edge are really in the transitional state before polarization.  

      In this regard, Fig. 1E shows the analysis of the wound-edge cells at 0.5 and 2 h after the addition of wound, which provides suitable data to verify the authors' claim. However, the corresponding legend indicates that these statistical data are based on the illustration in Fig. 1F, which is probably based on highly ambiguous data in Supplementary Fig. 2C (see above).  

      Taken together, I strongly recommend the authors to remove Fig.1A-D. Instead, they should include the improved figure corresponding to the present Supplementary Fig.2C and present its statistical analysis similar to the present Fig.1E for this claim.

      We sincerely thank the reviewer for their constructive feedback and recommendations. Below, we address the concerns raised regarding Figure 1A-D and Supplementary Figure 2C.

      To provide stronger evidence for the transitional state of the Golgi apparatus during reorientation and the dynamic regulation of CAMSAP2 localization, we conducted live-cell imaging experiments. These results, now presented in Supplementary Figure 2A, clearly demonstrate that the Golgi apparatus undergoes a transitional state characterized by dispersion before reorienting toward the leading edge.

      Additionally, we analyzed fixed wound-edge cells at different time points during directed migration to observe CAMSAP2’s colocalization with the Golgi apparatus. The results, shown in Figures 1A and 1B, reveal dynamic changes in CAMSAP2 localization, confirm its regulation during Golgi reorientation, and include a corresponding statistical analysis (page 7, lines 1-17).

      These updates ensure that our claims are supported by robust and unambiguous data.

      We hope these revisions address the reviewer’s concerns and provide clear and reliable evidence for the transitional state of the Golgi apparatus and CAMSAP2’s dynamic regulation. We are grateful for the reviewer’s constructive suggestions, which have greatly improved the quality and focus of our manuscript.

      (3) In Supplementary Fig. 5 and Fig. 4, the authors claim that MARK2 phosphorylates S835 of CAMSAP2.  

      There are many issues to be addressed. Otherwise, the above claim cannot be assumed to be reliable.  

      First, the descriptions (in the text and method sections) and figures (Supplementary Fig.5) concerning the in vitro kinase assay and subsequent phosphoproteomic analysis are too immature and contain many errors.  

      Legend to Supplementary Fig. 5 is too immature for comprehension. It should be completely rewritten in a more comprehensive manner. The figure in Supplementary Fig. 5C is also too immature for understanding. They simply paste raw mass spectrometric data without any modification for presentation.  

      We sincerely apologize for the lack of clarity and inaccuracies in the original descriptions and figure legends for the in vitro kinase assay and phosphoproteomic analysis. We greatly appreciate the reviewer’s detailed comments, which have allowed us to address these issues comprehensively.

      To improve clarity and accuracy, we have rewritten the figure legend for the original Supplementary Figure 5 (now Supplementary Figure 4) as follows:

      (A): CBB staining of a gel with GFP-CAMSAP2, GST, and GST-MARK2. GFP-CAMSAP2 was expressed in Sf9 cells and purified. GST and GST-MARK2 were expressed in E. coli and purified.

      (B): Western blot analysis of an in vitro kinase assay. GST or GST-MARK2 was incubated with GFP-CAMSAP2 in kinase buffer (50 mM Tris-HCl pH 7.5, 12.5 mM MgCl2, 1 mM DTT, 400 μM ATP) at 30°C for 30 minutes. Reactions were stopped by boiling in the loading buffer.

      (C): Detection of phosphorylation at S835 in CAMSAP2 by mass spectrometry. The observed mass increases in b4, b5, b6, b7, b8, b10, b11, and b12 fragments indicate phosphorylation at Ser835.

      (D): Kinase assay samples analyzed using Phos-tag SDS-PAGE. HEK293 cells were cotransfected with the indicated plasmids. Band shifts of CAMSAP2 mutants were examined via western blot. Phos-tag was used in SDS-PAGE, and arrowheads indicate the shifted bands caused by phosphorylation.

      To address the reviewer’s concern about Supplementary Figure 5C, we have reformatted the mass spectrometry data to improve readability and presentation quality. The revised figure includes clearer annotations and graphical representations of the mass spectrometric evidence for phosphorylation at S835.

      We believe these updates enhance the comprehensibility and reliability of our data, providing robust support for our claim that MARK2 phosphorylates CAMSAP2 at S835. We hope these

      revisions address the reviewer’s concerns and demonstrate our commitment to improving the quality of our manuscript.

      The readers cannot understand how the authors purified GFP-CAMSAP2 for the kinase assay.

      The method section incorrectly states that the product was purified using Ni-resin.  

      We thank the reviewer for their comment regarding the purification of GFP-CAMSAP2 for the kinase assay. We would like to clarify that GFP-CAMSAP2 carries a His-tag, which allows for purification using Ni-resin, as described in the Methods section (page 23, Lines 32-40). Therefore, the description in the Methods section is correct.

      To avoid any potential misunderstanding, we have revised the Methods section to provide more detailed and precise descriptions of the purification process. Specifically, GFP-CAMSAP2 was cloned into the pOCC6_pOEM1-N-HIS6-EGFP vector, which includes a His-tag, and was expressed in Sf9 cells. The His-GFP-CAMSAP2 protein was purified using Ni-resin chromatography. Relevant details have been added to the Methods section (page 21, Lines 34-36:

      “CAMSAP2 was cloned into the pOCC6_pOEM1-N-HIS6-EGFP vector expressed in Sf9, purified as His-GFP-CAMSAP2.”; page 23, Lines 32-33: “His-GFP-CAMSAP2 was cotransfected with bacmids into Sf9 cells to generate the passage 1 (P1) virus.”).

      We hope these clarifications and revisions address the reviewer’s concern and improve the comprehensibility of our experimental details. We appreciate the reviewer’s feedback, which has helped us refine the manuscript.

      In this relation, GST and GST-MARK2 are described as having been purified from Sf9 insect cells in the text section (page9) and legend to Supplementary Fig. 5, but from E. coli in the method section. Which is correct?  

      We thank the reviewer for pointing out the inconsistencies in the descriptions regarding the source of GST and GST-MARK2. To clarify, both GST and GST-MARK2 were purified from E. coli, as stated in the Methods section (page 23, Lines 26-31). We have corrected the erroneous descriptions in the main text (page 8, Lines 35-36) and the legend to Supplementary Figure 4 to ensure consistency.

      Additionally, we have updated the legend for Supplementary Figure 4A to state the sources of each protein explicitly:

      “GFP-CAMSAP2 were expressed in Sf9 cells and purified. GST and GST-MARK2 were expressed in E. coli and purified.” (page 38, Lines 2-3)

      These revisions ensure that the experimental details are accurate and consistent across the manuscript, eliminating any potential confusion. We appreciate the reviewer’s careful review and constructive feedback, which have helped us improve the clarity and reliability of our study.

      Because the phosphoproteomic data (Supplementary Fig. 5C) are not provided clearly, the experimental data for Fig.4A, in which possible CAMSAP2 phosphorylation sites are illustrated, are completely unknown. For me, it is highly strange that only the serine residues are listed in Fig. 4A.

      We sincerely thank the reviewer for raising this important point regarding Figure 4A and the phosphoproteomic data in Supplementary Figure 5C.

      - Phosphorylation Sites in Figure 4A

      The phosphorylation sites illustrated in Figure 4A are derived from our analysis of the original mass spectrometry data. These sites were included based on their high confidence scores and data reliability. Importantly, only serine residues met the stringent criteria for inclusion, as no threonine or tyrosine residues had sufficient evidence for phosphorylation. To clarify this, we have updated the figure legend for Figure 4A (page 32, Lines3-7).

      - Improvements to Supplementary Figure 5C (Supplementary Figure 4D in the revised manuscript)

      To enhance transparency and clarity, we have reformatted Supplementary Figure 4D to include clearer annotations. The revised figure highlights the phosphopeptides used to identify the phosphorylation sites and provides a more comprehensive presentation of the mass spectrometry data. To clarify this, we have updated the figure legend for Supplementary Figure 4D (page 38, Lines 11-13).

      - Data Availability

      We will follow the journal’s guidelines by uploading the raw mass spectrometry data to the required public database upon manuscript acceptance. This ensures that the data are accessible and reproducible in compliance with journal standards.

      We hope these clarifications and updates address the reviewer’s concerns and improve the reliability and comprehensibility of our data presentation. We greatly appreciate the reviewer’s constructive feedback, which has helped us enhance the rigor and clarity of our manuscript.

      Considering the crude nature of the GST-MARK2 sample used for the in vitro kinase assay (Supplementary Fig. 5A), it is unclear whether MARK2 is responsible for all phosphorylation sites on CAMSAP2 detected in the phosphoproteomic analysis. Furthermore, if GFP-CAMSAP2 was purified from Sf9 insect cells, these sites might have been phosphorylated before incubation for the in vitro kinase assay. The authors should address these issues by including a negative control using the kinase-dead mutant of MARK2 in their in vitro kinase assay.

      We sincerely thank the reviewer for raising these important points regarding the potential prephosphorylation of GFP-CAMSAP2 and the role of MARK2 in the phosphorylation sites detected in our analysis.

      To address the possibility that GFP-CAMSAP2 may have been pre-phosphorylated during its expression in Sf9 insect cells, we conducted an in vitro comparison. Specifically, we compared the band shifts observed in GST-MARK2 + GFP-CAMSAP2 versus GST + GFP-CAMSAP2 under identical conditions. As shown in Supplementary Figure 4B, the GST-MARK2 + GFP-CAMSAP2 group exhibited a clear upward band shift compared to the GST + GFP-CAMSAP2 group, indicating additional phosphorylation events induced by MARK2.

      Regarding the inclusion of a kinase-dead MARK2 mutant as a negative control, we acknowledge this as a valuable suggestion for further confirming the specificity of MARK2 in phosphorylating CAMSAP2. While this experiment is not currently included, we plan to conduct it in our future studies to strengthen our findings.

      We hope this clarification and the provided evidence address the reviewer’s concerns. We are grateful for this constructive feedback, which has helped us critically evaluate and refine our experimental approach.

      (4) In Supplementary Fig.6A-C and Fig.5A-B, the authors claim that the phosphorylation of CAMSAP2 S835 is required for restoring the reduced reorientation of the Golgi in wound-healing cells and the delay in wound closure observed in MARK2 KO cells.  

      If the aforementioned claim is adequately supported by experimental data, it indicates that the defects in Golgi repolarization and wound closure in MARK2 KO cells can be mainly attributed to the reduced phosphorylation of S835 of CAMSAP2 in HT1080. Considering the presence of many well-known substrates of MARK2 for regulating cell polarity, this claim is highly striking.  

      However, to strongly support this conclusion, the authors should first perform a rescue experiment using MARK2 KO cells exogenously expressing MARK2. This step is essential for determining whether the defects observed in MARK2 KO cells are caused by the loss of MARK2 expression, but not by other artificial effects that were accidentally raised during the generation of the present MARK2 KO clone.  

      We sincerely thank the reviewer for their insightful suggestion regarding the rescue experiment to confirm that the defects observed in MARK2 KO cells are specifically caused by the loss of MARK2 expression.

      To address this, we performed a rescue experiment in MARK2 KO HT1080 cells by exogenously expressing GFP-MARK2. Our results, presented in Supplementary Figures 3C-E, demonstrate that GFP-MARK2 expression successfully restores the localization of CAMSAP2 on the Golgi apparatus in MARK2 KO cells.

      These findings strongly support the conclusion that the defects in Golgi architecture and CAMSAP2 Golgi localization are directly attributable to the loss of MARK2 expression, rather than any artificial effects potentially introduced during the generation of the MARK2 KO clone.

      We hope these additional experimental results address the reviewer’s concerns and provide robust evidence for the role of MARK2 in regulating Golgi reorientation and wound closure. We are grateful for the reviewer’s constructive feedback, which has significantly improved the rigor and clarity of our study.

      In addition, to evaluate the impact of the rescue effect of CAMSAP2, the authors should include the data of wild-type HT1080 and MARK2 KO cells in Fig. 5B to reliably demonstrate the aforementioned claim.  

      We thank the reviewer for their valuable suggestion to include data from wild-type HT1080 and MARK2 KO cells in Figure 5A-C to better evaluate the rescue effects of CAMSAP2.

      In response, we have incorporated data from wild-type HT1080 and MARK2 KO cells into Figure 5A-C. These additions provide a comprehensive comparison and further demonstrate the impact of CAMSAP2-S835A and CAMSAP2-S835D on Golgi reorientation relative to the wild-type and MARK2 KO conditions.

      These changes are reflected in Figures 5A-C.

      We hope these updates address the reviewer’s concerns and strengthen the reliability of our conclusions. We greatly appreciate the reviewer’s constructive feedback, which has significantly enhanced the robustness of our study.

      Principally, before checking the rescue effects in MARK2 KO cells, the authors should examine the rescue activity of the CAMSAP2 S835 mutants in restoring the reduced reorientation of the Golgi in wound-healing cells and the delay in wound closure observed in CAMSAP2 KO cells (Supplementary Fig.1F-H and Supplementary Fig.2A, B). These experiments are more essential experiments to substantiate the authors' claim.

      We thank the reviewer for their insightful suggestion to examine the rescue activity of CAMSAP2 S835 mutants in CAMSAP2 KO cells to further substantiate our claims.

      In Figure 4D-F, we observed significant differences between CAMSAP2 S835 mutants in their ability to restore Golgi structure and localization, indicating functional differences between these mutants. To better reflect the regulatory role of MARK2-mediated phosphorylation of CAMSAP2, we performed scratch wound-healing experiments in MARK2 KO cells by establishing stable cell lines expressing CAMSAP2 S835 mutants. These experiments allowed us to assess Golgi reorientation during wound healing and are presented in Figure 5A-C.

      We also attempted to generate stable cell lines expressing GFP-CAMSAP2 and its mutants in CAMSAP2 KO cells. Unfortunately, these cells consistently failed to survive, preventing successful construction of the cell lines.

      We hope these experiments and explanations address the reviewer’s concerns. We are grateful for the reviewer’s constructive feedback, which has helped us refine and improve our study.

      (5) The data presented in Fig. 6A and B are not sufficient to support the authors' notion that "our observation revealed notable changes in the Golgi apparatus and microtubule network distribution in relation to the wounding. (page 11)"  

      Fig. 6A, which includes only a single-cell image in each panel, does not demonstrate the general state of microtubules and the Golgi in the wound-edge cells. The reader cannot even know the migration direction of each cell.  

      Fig.6 B are not suitable to quantitatively support the authors' claim. The authors should find a way to quantitatively estimate the microtubule density around the Golgi and the shape and compactness of the Golgi in each cell facing the wound, not estimating the colocalization of microtubules and the Golgi, as in the present Fig. 6B.  

      We sincerely apologize for the confusion caused by our unclear descriptions and presentation.

      Here, we clarify the purpose and improvements made to address the reviewer’s concerns. In this study, we primarily aimed to observe the relationship between microtubules and the Golgi apparatus in cells at the leading edge of the wound during directed migration. In Figure 6A (now Supplementary Figure 6E), the images represent cells located at the wound edge at different time points. To improve clarity, we have added arrows indicating the migration direction and updated the figure legend to describe these details (page 40 lines 13-14).

      To better quantify the relationship between microtubules and the Golgi apparatus, we revised our analysis by referring to the quantitative method used in Figure 3F of the paper Molecular Pathway of Microtubule Organization at the Golgi Apparatus. Specifically, we performed a radial analysis of fluorescence intensity in cells at the wound edge, measuring the distance from the Golgi center (x-axis) and the normalized radial fluorescence intensity of microtubules and the Golgi (y-axis). These results are now presented in Supplementary Figure 6E and 6F.

      We hope these improvements address the reviewer’s concerns and provide stronger evidence for the changes in the Golgi apparatus and microtubule network distribution in relation to wound healing. We greatly appreciate the reviewer’s constructive feedback, which has significantly enhanced the clarity and rigor of our study.

      The legends to Fig. 6A and B indicate that they compared immunofluorescent staining of cells at the edge of the wound after 0.5h and 2 h of migration. However, the authors state in the text that they compared "the cells located before the wound" and "the cells at the trailing edge of the wounding (page 11)."Although this description is highly ambiguous and misleading, if they compared the wound-edge cells and the cells separated from the wound edge at 2 h after cell migration here, they should improve the experimental design as I pointed out in the 2nd major comment.  

      We thank the reviewer for their detailed feedback regarding the experimental design and the need to clarify our descriptions. We have addressed these concerns as follows:

      - Clarification of descriptions:

      We recognize that the previous description in the text regarding "the cells located before the wound" and "the cells at the trailing edge of the wounding" was ambiguous and potentially misleading. We have revised this text to accurately describe the experimental design. Specifically, we compared cells at the leading edge of the wound at different time points (0.5h and 2h post-migration). These corrections are reflected in figure legends (Supplementary Figure 6E and 6F ) and the Results section (page 11,lines 3-8).

      - Improved experimental design:

      To better support our conclusions, we performed live-cell imaging to observe the dynamic changes in the Golgi apparatus during directed migration. As shown in Supplementary Figure 2A, our results confirm that the Golgi apparatus undergoes a transient dispersed state before reorganizing into an intact structure.

      Additionally, we performed fixed-cell staining at different time points to analyze the colocalization of CAMSAP2 with the Golgi apparatus in cells at the leading edge of the wound. The colocalization analysis, presented in Figures 1A-C, further demonstrates the dynamic regulation of CAMSAP2 during Golgi reorientation.

      We hope these updates address the reviewer’s concerns and provide a clearer and more robust foundation for our conclusions. We are grateful for the reviewer’s constructive feedback, which has greatly enhanced the clarity and rigor of our study.

      Minor comments  

      (1) In Fig. 2 and Supplementary Fig. 3, the authors claim that MARK2 is enriched around the Golgi. However, this claim was based on immunofluorescent images of single cells and single-line scans.  

      It is better to present the statistical data for Pearson's coefficient as shown in Figs. 1D and E. To demonstrateMARK2 enrichment around Golgi, but not localization in Golgi, the authors should find a way to quantify the specific enrichment of MARK2 signals in the Golgi region.  

      We thank the reviewer for raising this important point regarding the enrichment of MARK2 around the Golgi apparatus. Upon further consideration, we acknowledge that our current data do not provide sufficient evidence to fully elucidate the mechanism of MARK2 localization to the Golgi.

      To maintain the scientific rigor of our study, we have removed this claim and the corresponding content from the manuscript, including original Figures 2 and Supplementary Figure 3 that specifically discuss MARK2 enrichment. These changes do not affect the primary conclusions of the study, which focus on the role of MARK2-mediated phosphorylation of CAMSAP2.

      We hope this clarification addresses the reviewer’s concerns. In the future, we plan to investigate the precise mechanism of MARK2 localization using additional experimental approaches. We are grateful for the reviewer’s constructive feedback, which has helped us refine the scope and focus of our manuscript.

      (2) In Fig. 3 and Supplementary Fig. 4, the authors report that CAMSAP2 localization on the Golgi is reduced in cells lacking MARK2.  

      Essentially, the present results support this claim. However, the authors should analyze the Golgi localization of CAMASP2 with the same quantification parameter because they used Pearson's coefficient in Fig. 1D, E and Supplementary Fig.4D but Mander's coefficient in Fig. 3C and Fig.4F.  

      We thank the reviewer for their insightful comment regarding the consistency of quantification parameters used in our analysis of CAMSAP2 localization on the Golgi apparatus.

      To address this concern, we have revised Figure 3C to use Pearson’s coefficient for consistency with Figure 1D, 1E (Figure 1B and 1E in the revised manuscript), and Supplementary Figure 4D (Supplementary Figure 3I in the revised manuscript). This ensures uniformity in the quantification parameters across these analyses.

      For Figure 4F, we have retained Mander’s coefficient, as it accounts for variability in expression levels due to overexpression in individual cells. We believe this approach provides a more accurate reflection of CAMSAP2 localization under the experimental conditions shown in Figure 4F.

      We hope these adjustments clarify our analysis and address the reviewer’s concerns. We greatly appreciate the reviewer’s constructive feedback, which has helped improve the consistency and accuracy of our study.

      (3) In Fig.4D-F, the authors claim that S835 phosphorylation of CAMSAP2 is essential for its localization to the Golgi apparatus and for restoring the Golgi dispersion induced by CAMASAP2 depletion.  

      Fig.4E indicates that the S835A mutant of CAMSAP2 significantly restores the compact assembly of the Golgi apparatus, and the differences in the rescue activities of the wild type, S835A, and S835D are rather small. These data contradict the authors' conclusions regarding the pivotal role of MARK2-mediated phosphorylation at the S835 site of CAMSAP2 in maintaining the Golgi architecture (page 9). The authors should remove the phrase "MARK2-mediated" from the sentence unless addressing the aforementioned issues (see 3rd major comment) and describe the role of S835 phosphorylation in more subdued tone.  

      We thank the reviewer for their constructive feedback regarding the conclusions drawn about the role of MARK2-mediated phosphorylation of CAMSAP2 at S835.

      In response, we have revised the relevant sentence to reflect a more nuanced interpretation of the data. Specifically, the original statement:

      “These observations indicate that the phosphorylation of serine 835 in CAMSAP2 is essential for its proper localization to the Golgi apparatus.”

      has been updated to:

      “These observations indicate that MARK2 phosphorylation of serine at position 835 of CAMSAP2 affects the localization of CAMSAP2 on the Golgi and regulates Golgi structure” (page 9, Lines 27-29).

      We hope this modification addresses the reviewer’s concerns. We are grateful for the feedback, which has helped us refine our conclusions and enhance the clarity of our manuscript.

      (4) In Figs. 5I, J and Supplementary Fig.7A-E, the authors claim that the S835 phosphorylationdependent interaction of CAMSAP2 with Uso1 is essential for its localization to the Golgi apparatus.  

      This claim was made based on immunofluorescent images of single cells and single-line scans, and was not sufficiently verified (Supplementary Fig.7B, C). Because this is a crucial claim for the present paper, the authors should present statistical data for Pearson's coefficient, as shown in Fig. 1D and E, to quantitatively estimate the Golgi localization of CAMSAP2.  

      We thank the reviewer for their suggestion to present statistical data using Pearson's coefficient for a more robust quantification of the Golgi localization of CAMSAP2.

      In response, we have revised the statistical analysis for Supplementary Figures 7B-C (Revised Figures 6F and 6G) to use Pearson's coefficient. This change ensures consistency with the quantification methods used in Figures 1D and 1E (Revised Figures 1B and 1E), allowing for a more standardized evaluation of CAMSAP2’s localization to the Golgi apparatus.

      We hope this modification addresses the reviewer’s concerns and strengthens the quantitative support for our claims. We are grateful for the reviewer’s constructive feedback, which has helped improve the rigor of our study.

      (5) The signal intensities of the immunofluorescent data in Fig. 4D, Fig. 5A, Sup-Fig. 3C and E, and Sup-Fig. 7S are very weak for readers to clearly estimate the authors' claims. They should be improved appropriately.  

      We thank the reviewer for highlighting the need to improve the clarity of the immunofluorescent data presented in several figures.

      In response, we have enhanced the signal intensities in Figures 4D, 5A, and Supplementary Figure 7D (Revised Supplementary Figure 6A) to make the signals clearer for readers, while ensuring that the adjustments do not alter the integrity of the original data. Supplementary Figures 3C and 3E was remove from our manuscript.

      Additionally, to improve consistency and readability across the manuscript, we have standardized the quantification methods for similar analyses:

      For CAMSAP2 localization to the Golgi, Pearson's coefficient has been used throughout the manuscript. Figure 3C has been updated to use Pearson's coefficient for consistency.

      For Golgi state analysis in wound-edge cells, we have used the Golgi position relative to the nucleus as a uniform metric. This has been applied to Supplementary Figures 1F and 1G, Figures 2D and 2E, and Figures 5A and 5B.

      We hope these adjustments address the reviewer’s concerns and improve the clarity and consistency of our study. We greatly appreciate the reviewer’s constructive feedback, which has significantly enhanced the quality of our manuscript.

      (6) As indicated above, the authors frequently change the parameters or methods for quantifying the same phenomena (for example, the localization of CAMSAP on the Golgi and Golgi state in wound edge cells) in each figure. This is highly confusing. They should unify them.  

      We thank the reviewer for their valuable feedback regarding the inconsistency in quantification methods across the manuscript.

      To address this concern, we have carefully reviewed the entire manuscript and standardized the methods used for quantifying similar phenomena:

      - CAMSAP2 localization on the Golgi: 

      Pearson's coefficient is now consistently used throughout the manuscript. For example, Figure 3C has been updated to use Pearson's coefficient to align with other figures, such as Figures 1B and 1E.

      - Golgi state in wound-edge cells: 

      The Golgi state is now uniformly measured based on the position of the Golgi relative to the nucleus. This method has been applied to Supplementary Figures 1F and 1G, Figures 2D and 2E, and Figures 5A and 5B.

      We believe these changes significantly improve the clarity and consistency of the manuscript, ensuring that readers can easily interpret the data. We are grateful for the reviewer’s constructive feedback, which has greatly helped us enhance the quality and rigor of our study.

      (7) The legends frequently fail to clearly indicate the number of independent experiments on which each statistical analysis was based.  

      We thank the reviewer for highlighting the need to clearly indicate the number of independent experiments for each statistical analysis.

      In response, we have carefully reviewed the entire manuscript and updated the figure legends to include the number of independent experiments for every statistical analysis. This ensures transparency and allows readers to better evaluate the reliability of the data.

      We hope these updates address the reviewer’s concerns and improve the clarity and rigor of the manuscript. We appreciate the reviewer’s constructive feedback, which has helped us enhance the quality of our work.

      (8) Supplemental Figs. 4E and 4F are not cited in the text.  

      We thank the reviewer for pointing out that Supplemental Figures 4E and 4F were not cited in the text.

      To address this, we have updated the manuscript to cite these figures (Revised Figures 2H and 2I) in the appropriate section (page 8, lines 1-5).

      “the absence of MARK2 can also influence the orientation of the Golgi apparatus during cell wound healing and cause a delay in wound closure (Figure 2 D-I and Figure 3 D).”

      We hope this revision resolves the reviewer’s concern and improves the clarity and completeness of the manuscript. We appreciate the reviewer’s feedback, which has helped us refine our work.

      (9) The data in Fig. 3 analyzed MARK2 knockout cells (not knockdown cells). The caption should be corrected.  

      We thank the reviewer for pointing out the incorrect use of "knockdown" in the caption of Figure 3.

      To address this, we have revised the title of Figure 3 from:

      “MARK2 knockdown reduces CAMSAP2 localization on the Golgi apparatus.”

      to:

      “MARK2 affects CAMSAP2 localization on the Golgi apparatus.”

      This updated caption reflects the inclusion of both MARK2 knockout and knockdown cell lines analyzed in Figure 3.

      We hope this correction resolves the reviewer’s concern and ensures the accuracy of our manuscript. We greatly appreciate the reviewer’s attention to detail, which has helped us improve the clarity and consistency of our work.

      (10) The present caption in Fig. 6 disagrees with the content of the figure.  

      We thank the reviewer for pointing out the inconsistency between the caption and the content of Figure 6.

      To address this issue, we have revised the content of Figure 6 to ensure it aligns accurately with the caption. The updated figure now reflects the description provided in the caption, eliminating any discrepancies and improving clarity for the readers.

      We appreciate the reviewer’s constructive feedback, which has helped us enhance the accuracy and presentation of our manuscript.

      (11) What do "CS" indicate in Fig. 4B and Supplementary Fig. 5D? The style used to indicate point mutants of CAMSAP2 should be unified. 835A or S835A?  

      We thank the reviewer for pointing out the inconsistency in the naming of CAMSAP2 mutants.

      To address this, we have revised all relevant figures and text to use the consistent format "S835A" and "S589A" for CAMSAP2 mutants. Specifically, in Figure 4B and Supplementary Figure 5D (now Supplementary Figure 4C), we have replaced the abbreviation "CS2" with "CAMSAP2" and updated the mutant names from "835A" and "589A" to "S835A" and "S589A," respectively. We hope these updates resolve the reviewer’s concerns and ensure clarity and consistency throughout the manuscript. We are grateful for the reviewer’s attention to detail, which has helped us improve the quality of our work.

      (12) Uso1 is not a Golgi matrix protein.  

      We thank the reviewer for pointing out the incorrect description of Uso1 as a Golgi matrix protein.

      In response, we have revised the manuscript to replace all references to “USO1 as a Golgi matrix protein” with “USO1 as a Golgi-associated protein.” This correction ensures that the terminology used in the manuscript is accurate and consistent with current scientific understanding.

      We appreciate the reviewer’s attention to detail, which has helped us improve the accuracy and quality of our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, De La Forest Divonne et al. build a repertory of hemocytes from adult Pacific oysters combining scRNAseq data with cytologic and biochemical analyses. Three categories of hemocytes were described previously in this species (i.e. blast, hyalinocyte, and granulocytes). Based on scRNAseq data, the authors identified 7 hemocyte clusters presenting distinct transcriptional signatures. Using Kegg pathway enrichment and RBGOA, the authors determined the main molecular features of the clusters. In parallel, using cytologic markers, the authors classified 7 populations of hemocytes (i.e. ML, H, BBL, ABL, SGC, BGC, and VC) presenting distinct sizes, nucleus sizes, acidophilic/basophilic, presence of pseudopods, cytoplasm/nucleus ratio and presence of granules. Then, the authors compared the phenotypic features with potential transcriptional signatures seen in the scRNAseq. The hemocytes were separated in a density gradient to enrich for specific subpopulations. The cell composition of each cell fraction was determined using cytologic markers and the cell fractions were analysed by quantitative PCR targeting major cluster markers (two per cluster). With this approach, the authors could assign cluster 7 to VC, cluster 2 to H, and cluster 3 to SGC. The other clusters did not show a clear association with this experimental approach. Using phagocytic assays, ROS, and copper monitoring, the authors showed that ML and SGC are phagocytic, ML produces ROS, and SGC and BGC accumulate copper. Then with the density gradient/qPCR approach, the authors identified the populations expressing anti-microbial peptides (ABL, BBL, and H). At last, the authors used Monocle to predict differentiation trajectories for each subgroup of hemocytes using cluster 4 as the progenitor subpopulation.

      The manuscript provides a comprehensive characterisation of the diversity of circulating immune cells found in Pacific oysters.

      Strengths:

      The combination of the two approaches offers a more integrative view.

      Hemocytes represent a very plastic cell population that has key roles in homeostatic and challenged conditions. Grasping the molecular features of these cells at the single-cell level will help understand their biology.

      This type of study may help elucidate the diversification of immune cells in comparative studies and evolutionary immunology.

      Weaknesses:

      The study should be more cautious about the conclusions, include further analyses, and inscribe the work in a more general framework.

      Reviewer #1 (Recommendations for the authors):

      The manuscript provides a comprehensive characterisation of the diversity of circulating immune cells found in Pacific oysters.

      Major comments:

      (1) The introduction would benefit from a clear description of what is known about immune cell development and diversity in this model. The bibliography on the three subtypes origins and properties (i.e. blast, hyalinocyte, and granulocytes) should be described in the introduction.

      We thank Reviewer #1 for their valuable comments, which have allowed us to further improve our manuscript. We have enriched the introduction with the following addition (line 79 to 82):

      “Blast-like cells are considered as undifferentiated hemocyte types (20), hyalinocytes (21) seem to be more involved in wound repair, and granulocytes, more implicated in immune surveillance. The latter are considered as the main immunocompetent hemocyte types (22).”

      (2) The authors mentioned a previous scRNAseq dataset produced in another oyster species. They should compare the two datasets to show the robustness of the molecular signatures determined in the present study. In addition, the authors do not mention markers identified in the literature that could be relevant to characterize the clusters (e.g. inflammatory pathway PMID: 29751033, proliferative markers PMID: 36591234/ PMID: 29317231, granulocyte markers PMID: 30633961 ... list not exhaustive). Overall, the comparison of this manuscript dataset and the available literature is too partial

      We appreciate the reviewer’s suggestion to compare our dataset with previously published scRNAseq data and to integrate markers from the literature. Below, we address these points in detail.

      The transcription factors involved in hematopoiesis, such as Tal1, Sox, Runx, and GATA, are highly conserved across metazoans. These markers were identified in our dataset, consistent with findings in other species (13), including the previously mentioned scRNA-seq dataset in C. hongkongensis (4). However, defining robust and specific markers for distinct hemocyte types remains an ambitious goal that requires validation across diverse biological contexts - work that is beyond the scope of the present study. Additionally, meaningful comparisons between datasets are constrained by differences in annotation frameworks and the absence of a standardized system for defining hemocyte subtypes. These limitations underscore the need for harmonization efforts to facilitate robust cross-study comparisons. Nonetheless, our dataset provides a strong foundation for future comparative analyses once such standardization is achieved.

      In response to the reviewer’s comment, we have added a paragraph to the discussion (lines 747 - 760) detailing that we identified conserved transcription factor markers in C. gigas and C. hongkongensis.

      (3) The authors sequenced 3000 cells without providing more comprehensive information/rationale on the analysed population. What is the number of hemocytes found in an adult? What proportion of the whole hemocyte population does this analysis represent? Does it include the tissue-interacting hemocytes? Also, what is the rationale for choosing that specific stage?

      We thank the reviewer for their insightful questions regarding the analyzed hemocyte population.

      Adult 18-month-old Crassostrea gigas contain approximately 1 million circulating hemocytes per mL of hemolymph, with an average of 1 mL of hemolymph per individual. Thus, this represents approximately 1 million circulating hemocytes per oyster. For our scRNA-seq analysis, we sampled 3,000 hemocytes, which corresponds to 0.3% of the total circulating hemocyte population.

      The number of cells processed was optimized to minimize the occurrence of doublets during scRNAseq. Following 10x Genomics Chromium guidelines, we loaded 4,950 cells to successfully recover a target of 3,000 cells, with a doublet rate of 2.4%, well below the target threshold of 2.5%. This information has been added on line 125 of the document. The target was 3,000 cells, and as reported in Supplementary Table S1, the estimated number of cells after STAR-solo alignment was 2,937. This ensures the reliability and accuracy of single-cell transcriptomic data.

      We selected 18-month-old oysters for two key reasons: (i) to facilitate hemolymph collection, as hemocyte counts are more stable and sufficient at this stage, enabling us to collect enough cells for all planned experiments, including functional and cytological analyses; and (ii) to use oysters that are not susceptible to OsHV-1 μVar herpesvirus, which predominantly affects younger animals. This ensured that the hemocyte populations analyzed were not influenced by viral infections or related immune responses.

      Our study focused on circulating hemocytes collected from hemolymph, which does not include tissue-interacting hemocytes. While these cells may represent an additional population of interest, they fall outside the scope of our current investigation.

      By carefully selecting the animal stage and optimizing cell sampling, we ensured that the scRNA-seq dataset provides a robust representation of circulating hemocyte diversity while maintaining high data quality.

      (4) For the GO term enrichment analysis, the authors included all genes presenting a cluster enrichment above L2FC>0.25. This seems extremely low to find distinct functions for each cluster. The risk is to call "cluster specific GO term" GO terms for which the genes are poorly enriched in the cluster. For the most important GO term mentioned in the text, the authors should show the expression levels of the genes (with DotPlot similar to Fig1D) to illustrate the specificity of the GO term. At last, the GO enrichment scores were apparently calculated using the whole genome as background. The analysis, aiming at finding differences between hemocyte subgroups, should use the genes detected in the dataset as background.

      We appreciate the reviewer's concerns regarding the threshold used for GO term enrichment analysis and the choice of background genes. Below, we provide clarification on these points.

      For nuanced comparisons, such as those between activation states of the same cell type, lower thresholds for log2FC (e.g., ≥0.25) are commonly used to detect subtle regulatory shifts. In single-cell RNA sequencing (scRNA-seq) analyses, it is typical to use a log2FC threshold between 0.25 and 0.5 to ensure that biologically relevant, yet subtle, changes are captured. For our analysis, this threshold was chosen to maintain sensitivity to such shifts, particularly given the diversity and functional specialization of hemocyte clusters.

      To address the reviewer's suggestion, we will include DotPlot representations (similar to Fig. 1D) for the most significant GO terms highlighted in the text. This will illustrate the expression levels of the associated genes across clusters and demonstrate their specificity to the identified GO terms.

      Regarding the background used in the GO enrichment analysis, we employed the Rank Based Gene Ontology Analysis (RBGOA) approach, which explicitly states in its documentation: "It is important to have the latter two tables representing the whole genome (or transcriptome) — at least the portion that was measured — rather than some select group of genes since the test relies on comparing the behavior of individual GO categories to the whole." Our analysis was conducted in agreement with these initial recommendations, ensuring that the results are consistent with the methodology outlined for RBGOA.

      (5) The authors reannotated the genes of C. gigas to reach 73.1% annotation. What are the levels of annotations found prior to the reannotation? What do the scores/scale bars from the RBGOA analysis mean in Figures 2B-D?

      Thank you for your comment. The original annotation for C. gigas was based on the work of Penaloza et al. (5), which provided GO annotations for 18,750 out of 30,724 genes, corresponding to 61% annotation. Following our reannotation efforts, we were able to increase the annotation coverage to 73.1%, enhancing the resolution of downstream analyses. In response to the reviewer’s comment, we have updated the results section (line 211 and 216) to explicitly include the original annotation coverage of 61% from the work of Penaloza et al., followed by details on our newly achieved annotation percentage of 73.1%.

      Thank you for pointing this out. We apologize for the oversight regarding the scale bar in Figures 2BD. The colors in the original figure correspond to a z-score calculated from the gene ratio, which was not clearly explained and may have caused confusion. In the revised version of the manuscript, we propose a new representation to facilitate understanding and improve the clarity of the data presentation (Figure 2B).

      (6) The authors describe first the result of the Kegg enrichment analysis and then of the RBGOA. To gain fluidity, I would suggest merging the results of both Kegg and RBGOA for each cluster.

      Thank you for the suggestion. To enhance the fluidity of the results section, we have redesigned the KEGG/RBGOA figure (see figure 2A and 2B) to present the results for each cluster in an integrated manner. This revised approach aims to provide a clearer and more cohesive representation of the findings.

      (7) The authors make correlations between gradient fraction containing multiple hemocyte populations and qPCR expression levels of cluster-specific markers to associated cytologic features with specific clusters. If feasible, I would recommend validating the association of several markers with hemocyte subgroups using in situ hybridisation or immunolabelling.

      Cytological identification of hemocytes in our study relies on MCDH staining, which provides detailed morphological and cytological information. Unfortunately, the fixation methods required for in situ hybridization (ISH) or immunolabeling are not compatible with those used for MCDH staining. We attempted to combine these approaches but found that the fixation protocols necessary for ISH or immunolabeling compromised the quality of the cytological features observed with MCDH staining. Consequently, such validation was not feasible within the constraints of our experimental setup.

      (8) Anti-microbial peptides are mentioned as enriched in agranular cells based on the gradient/qPCR analysis (Figure 6). Are these AMPs regulated by inflammatory pathways? Are any inflammatory pathways enriched in any scRNAseq cluster? In addition, without validating the data by directly labelling AMP in the different populations, it seems hard to conclude that AMP are expressed only by agranular cells.

      In oysters, two families of antimicrobial peptides/proteins appear to be transcriptionally regulated in hemocytes in response to an infection. The first is that of Cg-BigDefs (6). A 2020 article indicates that the expression of CgBigDef1 is regulated by CgRel, an ortholog of the NFkB transcription factor, which also control the expression of the proinflammatory cytokine CgIL17 (7). Cg-BPI is induced in response to infection but its regulatory pathways remain unknown (8). The last well characterized family of antimicrobial peptides is Cg-Defs. It exhibits constitutive expression in hemocytes.

      In our scRNA-seq analysis, CgRel (G12420) shows an increased expression in cluster 5, with a log2FC of 0.4 (equivalent to a 1.32-fold change or 32% higher expression compared to other clusters). Cluster 5 corresponds to blast-like cells, which are transcriptionally distinct and predominantly found in fractions 1, 2, and 3. These same fractions exhibit the highest CgBigDef expression, as demonstrated by qPCR.

      From our qPCR results, we see no expression of the three AMP families in cell-sorted granular cells while the cell-sorted agranular cells are positive for the three AMP families, even for inducible ones. Still, we agree that labelling of cell sorted hemocyte populations would reinforce our data. We now specify in the text that further staining would be necessary to confirm these transcriptomic results (Discussion, lines 695 to 296).

      (9) The authors should play down some statements concerning cluster identity. In the absence of a true lineage tracing approach, it is possible that those clusters represent states rather than true cell subtypes. Immune cells are very plastic in nature and able to adapt to the environment, even in conditions that are considered homeostatic.

      We appreciate the reviewer’s insightful comment regarding the plasticity of immune cells and the potential for clusters to represent states rather than distinct cell subtypes. We agree that, in the absence of a lineage tracing approach, definitive classification of clusters as fixed subtypes is challenging. Immune cells, including those in invertebrates, are known for their high degree of plasticity and adaptability to environmental cues.

      In response to the reviewer’s comment, we have revised the Discussion section to include a statement clarifying that these clusters may represent dynamic states rather than fixed subtypes, thereby acknowledging the plasticity of immune cells (lines 766 to 770).

      (10) Related to the above issue, there is no indication of stem cells being present in the cell population. Is there any possibility to look for proliferative or progenitor markers? In homeostatic and in challenged conditions (for example Zymosan treatment)? This would provide some hints into the cellular pathways involved in the response. Perhaps determining the number/fraction of phagocytic cells in challenged conditions would help as well, in the absence of time-lapse assays.

      Thank you for highlighting the possibility of stem cells or progenitor markers in our hemocyte populations. In our current analysis, we did not detect any known stem cell or proliferative markers, nor evidence of a clearly defined hematopoiesis site in the hemolymph. Indeed, previous work suggests that oyster hematopoiesis may occur in tissues such as the gills, implying that stem or progenitor cells might not circulate in the hemolymph under homeostatic conditions. Consequently, it is plausible that our observation of no proliferative cell populations partly reflects their absence in hemolymph, especially in naïve (unstimulated) oysters. To conclusively identify potential progenitor cells and their proliferative activity, further approaches involving deliberate perturbation of hemocyte homeostasis - such as immunological challenge (e.g., Zymosan treatment) combined with lineagetracing or proliferation assays - would be necessary. These future investigations would not only clarify whether proliferative cells emerge in the hemolymph in response to environmental or pathological stimuli but also help elucidate the broader cellular pathways underlying oyster immune responses.

      In response to the reviewer’s comment, we have revised the Discussion (lines 742 to 745) and added : “Nevertheless, we did not detect any canonical stem or progenitor cell populations in our dataset, underscoring the need for future investigations - potentially involving immunological challenges and lineage-tracing assays - to clarify whether proliferative cells circulate in the hemolymph or instead reside primarily in tissue compartments.”

      (11) Could the authors discuss the phagocytic hemocytes in light of scavenger receptor expression?

      We thank the reviewer for this insightful question. Our study identifies macrophage-like cells and small granule cells as the principal phagocytes in Crassostrea gigas, capable of robust pathogen engulfment. Transcriptomic data reveal that these cell types express markers associated with endocytosis and immune defense pathways, such as CLEC and LACC24, which are integral to their phagocytic functionality.

      Interestingly, our single-cell RNA sequencing analysis indicates that cluster 3, corresponding to small granule cells, expresses the scavenger receptor cysteine-rich (SRCR) gene G3876, annotated as an Low-density lipoprotein receptor-related protein with a Log2 fold change (Log2FC) of 0.77. This finding directly links small granule cells to scavenger receptor-mediated functions, supporting their role as professional phagocytes. Scavenger receptors, including SRCR proteins, are known for their ability to bind and internalize diverse ligands, including pathogens, and their presence in small granule cells highlights a potential mechanism for pathogen recognition and clearance.

      Additionally, scavenger receptors are significantly expanded in oysters, as shown in Wang et al. (9). These receptors exhibit dynamic upregulation in hemocytes upon pathogen exposure, particularly following stimulation with pathogen-associated molecular patterns (PAMPs) such as lipopolysaccharide (LPS). This evidence suggests that SRCR proteins, including the one identified in our study, play a pivotal role in the phagocytic activities of hemocytes by facilitating pathogen recognition and internalization.

      We propose to add this paragraph (lines 610 to 618) in the Discussion : “Interestingly, our scRNA-seq analysis indicates that SGC (cluster 3) expresses the scavenger receptor cysteine-rich (SRCR) gene G3876, annotated as an Low-density lipoprotein receptor-related protein with a Log2 fold change (Log2FC) of 0.77 linking them to scavenger receptor-mediated pathogen recognition and clearance. This aligns with findings by Wang et al. (9), who demonstrated significant expansion and dynamic regulation of SRCR genes in response to pathogen-associated molecular patterns. “

      (12) I am not convinced by the added value of the lineage analysis and the manuscript could stand without it. There is no experimental validation to substantiate the filiation between the clusters. In addition, rooting the lineage to cluster 4 is poorly justified (enrichment in the ribosomal transcript). Cluster 6 is also enriched in ribosomal transcripts and this enrichment can be caused by the low threshold used for the selection of cluster-specific genes (L2FC >0.25). At last, cluster 4 > VC and cluster 4 >SGC belong to the same lineage according to Figure 7 FH.

      We thank the reviewer for their detailed comments regarding the lineage analysis. We acknowledge the limitations in experimentally validating the proposed filiation between clusters, as hemocytes in Crassostrea gigas cannot currently be cultivated ex-vivo, and we lack the ability to isolate cells specifically from cluster 4 for further functional assays. Consequently, our lineage analysis is based solely on transcriptomic data and pseudo-time trajectory analysis.

      Hematopoietic stem cells (HSCs) are a population of stem cells that are largely cell-cycle-quiescent (G0 phase) with low biosynthetic activity. Upon stimulation and stress HScs undergo proliferation and differentiation and produce all lineages of hemocytes.

      Ribosomal proteins play a multifaceted role in preserving the balance between stem cell quiescence and activation. By ensuring precise regulation of protein synthesis, they allow stem cells to maintain their undifferentiated state while remaining poised for activation when needed. Furthermore, ribosomal proteins contribute to the cellular stress response, safeguarding stem cells from oxidative damage and other stressors that could compromise their functionality. Importantly, ribosomal biogenesis and the dynamic assembly of ribosomes provide a regulatory mechanism that fine-tunes the transition from self-renewal to differentiation, a critical feature of hematopoietic stem cells (HSCs) and other stem cell types. These mechanisms collectively highlight the indispensable role of ribosomal proteins in stem cell biology, underscoring their relevance to our study's findings.

      In vertebrate, the maintenance of hematopoietic stem cells (HSCs) and hematopoietic homeostasis is widely acknowledged to rely on the proper regulation of ribosome function and protein synthesis (10). This process necessitates the coordinated expression of numerous genes, including genes that encode ribosomal proteins (RP genes) and those involved in regulating ribosome biogenesis and protein translation. Disruptions or mutations in these critical genes are associated with the development of congenital disorders (11). Among these, Rpl22 (found in cluster 4 with a Log2FC of 1.59) has been shown to play a pivotal role in HSC maintenance by balancing ribosomal protein paralog activity, which is critical for the emergence and function of HSCs (12).

      Regarding the justification for rooting the lineage to cluster 4, our decision was informed by the enrichment of ribosomal transcripts and functional annotations suggesting a role in translation and cell proliferation, consistent with a precursor-like state. The use of a log2 fold-change (L2FC) threshold of >0.25, while conservative, allowed us to include subtle but meaningful transcriptional shifts essential for resolving lineage transitions.

      Finally, the lineage progression from cluster 4 to vesicular cells (VC), macrophage-like cells (ML), and ultimately small granule cells (SGC) is supported by trajectory analysis (Figure 7FH), which consistently places VC and ML as intermediates in the differentiation process toward SGC. Although experimental validation is currently not feasible, these findings provide a conceptual framework for future investigations when cell isolation and functional validation tools become available.

      (13) The figures containing heatmaps (Figure 7, Figure 2, Figure S10) or too many subpanels (Figure S5) and Table S5 are hardly readable.

      Thank you for highlighting the issues related to the clarity of the heatmaps (Figures 2, 7, and S10), the multi-panel figure (Figure S5), and Table S5. In response to your feedback, we have revised all of these elements to enhance readability and comprehension. Specifically, we increased font sizes, optimized color scales, and reorganized the layout of the subpanels to emphasize the key findings. We also updated Table S5 to ensure that the data are presented in a clear and easily interpretable format.

      We trust that these modifications address the concerns raised and improve the overall clarity of the figures and table.

      (14) A number of single-cell analyses are now available in different species and the authors allude to similar pathways/transcription factors being involved. Perhaps the authors could expand on this in the discussion section.

      Transcription factors involved in hematopoiesis, such as Tal1, Runx and GATA, are highly conserved across metazoans. Consistent with findings in other species, our dataset identifies these markers, reinforcing the evolutionary conservation of these pathways. Furthermore, these markers are also reported in the previous scRNA-seq dataset for C. hongkongensis (4), supporting the robustness of our molecular signatures. However, defining specific and robust markers for distinct hemocyte types remains an ambitious task, requiring additional validation in diverse biological and experimental contexts. This validation is beyond the scope of the present study.

      In addition, meaningful comparisons between scRNA-seq datasets are constrained by differences in annotation frameworks and the absence of standardized definitions for hemocyte subtypes. Harmonizing these datasets to enable robust cross-species comparisons is a critical challenge for future studies. Nonetheless, the insights provided by our dataset establish a strong foundation for such comparative analyses when these standardization efforts are realized.

      In crayfish (1), 16 transcriptomic clusters were identified corresponding to three hemocyte types, with markers such as integrin prominently expressed in hyalinocytes, consistent with our identification of integrin-related genes in hemocytes. In shrimp (1), 11 transcriptomic clusters were described, with markers of hemocytes in immune-activated states, that we observed also in our dataset. For Anopheles gambiae (2), 8 transcriptomic clusters were identified, including clusters with high ribosomal activity, analogous to those we described in our study. Finally, in Bombyx mori (3), 20 transcriptomic clusters were reported, corresponding to five cytological hemocyte types. Transcription factors such as bHLH, myc, and runt were identified in granulocytes and oenocytoid, showing parallels with markers identified in our dataset.

      Despite these similarities, cross-species comparisons are hindered by variability in genome availability and annotation quality, which complicates the precise identification and functional characterization of genes across datasets. Notably, we did not detect pro-phenoloxidase genes in our dataset, unlike shrimp and crayfish, suggesting potential species-specific differences in immune mechanisms.

      Regarding the previously published C. hongkongensis scRNA-seq dataset (4), we observe overlap in markers such as runx and GATA. However, direct comparisons remain limited due to differences in dataset annotations and definitions of hemocyte subtypes. This underscores the need for standardized frameworks to facilitate cross-study comparisons. While we emphasize that robust cross-species validation was beyond the scope of this study, our findings contribute valuable insights into the molecular signatures of oyster hemocytes and provide a framework for future comparative research.

      We have expanded our discussion to include comparisons with available scRNAseq data from other invertebrate specie (lines 747 to 760)

      Minor comments:

      (1) Figure 2A-D: to increase the readability of the figure, the authors should display only the GO terms mentioned in the text and keep the full list in supplementary data.

      To enhance the fluidity of the results section, we have redesigned the KEGG/RBGOA figure to present the results for each cluster in an integrated manner (See figure 2A and 2B).

      (2) Line 223: the authors mention that cluster 1 is characterized by its morphology without providing an explanation or evidence.

      We have revised the description of Cluster 1 to remove references to morphology, ensuring consistency with the data presented at this stage of the manuscript (lines 227 to 229) : ”Cluster 1, comprising 27.6 % of cells, is characterized by GO-terms related to myosin complex, lamellipodium, membrane and actin cytoskeleton remodelling, as well as phosphotransferase activity.”

      (3) Line 306: the authors mentioned expression levels and associated them with Log2FC, which represents an enrichment, not the level of expression.

      Thank you for pointing this out. We agree that log2FC represents enrichment rather than absolute expression levels. We have revised the text in the manuscript to clarify this distinction (line 309). The corrected text now states that log2FC reflects the degree of enrichment or depletion of a gene in a specific cluster relative to others, rather than its absolute expression level.

      (4) Figure 4B: the figure shows the distribution of all hemocytes subgroups for each fraction. To better appreciate the distribution of the subgroups in the different fractions, it would be good to have the number of cells of each subtype in the fractions.

      We thank the reviewer for their suggestion to include the number of cells of each subtype in the fractions. While we do not have the exact total number of cells per fraction, we systematically performed hemocyte counts for each fraction as part of our methodology. These counts provide a robust estimation of hemocyte distributions across fractions.

      Including these counts in the figure could be an alternative approach; however, we believe it would not significantly enhance the interpretability of the data, as the focus of this analysis is on the relative proportions of hemocyte subtypes rather than absolute numbers. The current representation provides a clear and concise overview of subtype distribution patterns, which aligns with the goals of the study.

      Nevertheless, if the reviewer considers it essential, we are open to integrating the hemocyte counts into the figure or supplementing the information in the text or supplementary materials to provide additional context.

      (5) Line 487-488: the authors mentioned that monocle 3 can deduce the differentiation pathway from the mRNA splice variant. I did not find this information in the publication associated with the statement.

      Thank you for pointing this out. We acknowledge the inaccuracy in our statement regarding Monocle3's capabilities. Monocle3 does not deduce differentiation pathways based on mRNA splice variants, as was erroneously suggested in the manuscript. Instead, Monocle3 performs trajectory inference using gene expression profiles. It calculates distances between cells based on their transcriptomic profiles, where cells with similar profiles are positioned closer together, and those with distinct profiles are farther apart. This method enables the construction of potential differentiation trajectories by identifying paths between transcriptionally related cells.

      We revise the text in the manuscript to accurately describe this process and remove the incorrect reference to mRNA splice variants (lines 495 to 497).

      (6) Figures 6C-H display heatmaps with two columns representing the beginning and the end of the lineage predicted. It would be more talkative to show the whole path presented in Figure S10.

      Thank you for pointing out that Figures 7C–H currently only show the beginning and end of the predicted lineage, limiting the clarity of the intermediate stages. In response to your suggestion, we have revised these figures to include the full trajectory as presented in Figure S10, ensuring that the intermediate transitions are more clearly visualized. We believe these modifications offer a more comprehensive overview of the entire lineage and enhance the interpretability of our results.

      Bibliography:

      (1) F. Xin, X. Zhang, Hallmarks of crustacean immune hemocytes at single-cell resolution. Front. Immunol. 14 (2023).

      (2) H. Kwon, M. Mohammed, O. Franzén, J. Ankarklev, R. C. Smith, Single-cell analysis of mosquito hemocytes identifies signatures of immune cell subtypes and cell differentiation. eLife 10, e66192 (2021).

      (3) M. Feng, L. Swevers, J. Sun, Hemocyte Clusters Defined by scRNA-Seq in Bombyx mori: In Silico Analysis of Predicted Marker Genes and Implications for Potential Functional Roles. Front. Immunol. 13 (2022).

      (4) J. Meng, G. Zhang, W.-X. Wang, Functional heterogeneity of immune defenses in molluscan oysters Crassostrea hongkongensis revealed by high-throughput single-cell transcriptome. Fish & Shellfish Immunology 120, 202–213 (2022).

      (5) C. Peñaloza, A. P. Gutierrez, L. Eöry, S. Wang, X. Guo, A. L. Archibald, T. P. Bean, R. D. Houston, A chromosome-level genome assembly for the Pacific oyster Crassostrea gigas. GigaScience 10, giab020 (2021).

      (6) R. D. Rosa, A. Santini, J. Fievet, P. Bulet, D. Destoumieux-Garzón, E. Bachère, Big Defensins, a Diverse Family of Antimicrobial Peptides That Follows Different Patterns of Expression in Hemocytes of the Oyster Crassostrea gigas. PLOS ONE 6, e25594 (2011).

      (7) Y. Li, J. Sun, Y. Zhang, M. Wang, L. Wang, L. Song, CgRel involved in antibacterial immunity by regulating the production of CgIL17s and CgBigDef1 in the Pacific oyster Crassostrea gigas. Fish & Shellfish Immunology 97, 474–482 (2020).

      (8) Evidence of a bactericidal permeability increasing protein in an invertebrate, the Crassostrea gigas Cg-BPI | PNAS. https://www.pnas.org/doi/abs/10.1073/pnas.0702281104.

      (9) L. Wang, H. Zhang, M. Wang, Z. Zhou, W. Wang, R. Liu, M. Huang, C. Yang, L. Qiu, L. Song, The transcriptomic expression of pattern recognition receptors: Insight into molecular recognition of various invading pathogens in Oyster Crassostrea gigas. Developmental & Comparative Immunology 91, 1–7 (2019).

      (10) R. A. J. Signer, J. A. Magee, A. Salic, S. J. Morrison, Haematopoietic stem cells require a highly regulated protein synthesis rate. Nature 509, 49–54 (2014).

      (11) A. Narla, B. L. Ebert, Ribosomopathies: human disorders of ribosome dysfunction. Blood 115, 3196–3205 (2010).

      (12) Y. Zhang, A.-C. E. Duc, S. Rao, X.-L. Sun, A. N. Bilbee, M. Rhodes, Q. Li, D. J. Kappes, J. Rhodes, D. L. Wiest, Control of Hematopoietic Stem Cell Emergence by Antagonistic Functions of Ribosomal Protein Paralogs. Developmental Cell 24, 411–425 (2013).

      Reviewer #2 (Public review):

      Summary:

      This work provides a comprehensive understanding of cellular immunity in bivalves. To precisely describe the hemocytes of the oyster C. gigas, the authors morphologically characterized seven distinct cell groups, which they then correlated with single-cell RNA sequencing analysis, also resulting in seven transcriptional profiles. They employed multiple strategies to establish relationships between each morphotype and the scRNAseq profile. The authors correlated the presence of marker genes from each cluster identified in scRNAseq with hemolymph fractions enriched for different hemocyte morphotypes. This approach allowed them to correlate three of the seven cell types, namely hyalinocytes (H), small granule cells (SGC), and vesicular cells (VC). A macrophage-like (ML) cell type was correlated through the expression of macrophage-specific genes and its capacity to produce reactive oxygen species. Three other cell types correspond to blast-like cells, including an immature blast cell type from which distinct hematopoietic lineages originate to give rise to H, SGC, VC, and ML cells. Additionally, ML cells and SGCs demonstrated phagocytic properties, with SGCs also involved in metal homeostasis. On the other hand, H cells, nongranular cells, and blast cells expressed antimicrobial peptides. This study thus provides a complete landscape of oyster hemocytes with functional validation linked to immune activities. This resource will be valuable for studying the impact of bacterial or viral infections in oysters.

      Strengths:

      The main strength of this study lies in its comprehensive and integrative approach, combining single-cell RNA sequencing, cytological analysis, cell fractionation, and functional assays to provide a robust characterization of hemocyte populations in Crassostrea gigas.

      (1) The innovative use of marker genes, quantifying their expression within specific cell fractions, allows for precise annotation of different cellular clusters, bridging the gap between morphological observations and transcriptional profiles.

      (2) The study provides detailed insights into the immune functions of different hemocyte types, including the identification of professional phagocytes, ROS-producing cells, and cells expressing antimicrobial peptides.

      (3) The identification and analysis of transcription factors specific to different hemocyte types and lineages offer crucial insights into cell fate determination and differentiation processes in oyster immune cells.

      (4) The authors significantly advance the understanding of oyster immune cell diversity by identifying and characterizing seven distinct hemocyte transcriptomic clusters and morphotypes.

      These strengths collectively make this study a significant contribution to the field of invertebrate immunology, providing a comprehensive framework for understanding oyster hemocyte diversity and function.

      Weaknesses:

      (1) The authors performed scRNAseq/lineage analysis and cytological analysis on oysters from two different sources. The methodology of the study raises concerns about the consistency of the sample and the variability of the results. The specific post-processing of hemocytes for scRNAseq, such as cell filtering, might also affect cell populations or gene expression profiles. It's unclear if the seven hemocyte types and their proportions were consistent across both samples. This inconsistency may affect the correlation between morphological and transcriptomic data.

      We thank the reviewer for highlighting the importance of sample consistency and potential variability, and we acknowledge the need for clarification regarding the use of oysters from two different sources.

      Oysters from La Tremblade (known pathogen-free in standardized conditions) were used to establish the hemocyte transcriptomic atlas through scRNA-seq and for cytological analyses. Oysters from the Thau Lagoon (Bouzigues) were used for cytological, functional, and fractionation experiments. These oysters were sampled during non-epidemic periods and monitored under Ifremer’s microbiological surveillance to ensure pathogen free status.

      The cytological results (hemocytograms) presented in Figure 3 and Supplementary Figure S3 were derived from Thau Lagoon oysters. To clarify, we updated The Table 3 in Figure 3 and Supplementary Figure S3 to explicitly display hemocyte counts for oysters from both La Tremblade and Thau Lagoon. These data confirm consistent proportions of hemocyte types across both sources, with no significant differences (p > 0.05).

      Hemocyte isolation and filtering protocols were rigorously optimized to preserve cell viability and morphology during scRNA-seq library preparation. Viability assays and cytological evaluations confirmed that these procedures did not significantly alter hemocyte populations or their proportions. Sample processing times were minimized to ensure that the scRNA-seq results accurately reflect the native state of the hemolymph.

      Taken together, our results confirm that variability between oyster sources or methodological processes did not compromise our findings. This ensures that the correlations between morphological and transcriptomic data are reliable and robust.

      (2) The authors claim to use pathogen-free adult oysters (lines 95 and 119), but no supporting data is provided. It's unclear if the oysters were tested for bacterial and viral contaminations, particularly Vibrio and OsHV-1 μVar herpesvirus.

      The oysters used in this study were sourced from two distinct origins. First, the animals (18 months old) utilized for scRNA-seq and cytological analyses were obtained from the Ifremer controlled farm located in La Tremblade, France (GPS coordinates: 45.7981624714465, -1.150171788447683). This facility exclusively produces standardized oysters bred in controlled conditions with filtered seawater, entirely isolated from environmental known pathogens. The oysters from this source are certified “pathogen-free” upon arrival at the laboratory, following Ifremer's stringent quality control protocols. We have replaced the term 'pathogen-free' with 'known pathogen-free’ (line 123) to accurately reflect the animals' true status.

      Second, for the fractionation experiments and functional tests, oysters were either sourced from the aforementioned Ifremer farm or from a producer located in the Thau Lagoon, France (GPS coordinates: 43.44265228308842, 3.6359883059292057). The Thau Lagoon is subject to comprehensive environmental and microbiological surveillance by the Ifremer monitoring network and the regional veterinary laboratory. For these experiments, we specifically selected oysters aged 18 months - an age associated with reduced susceptibility to OsHV-1 μVar herpesvirus - and ensured that sampling occurred outside of any detected epidemic periods. Furthermore, prior to experimentation, hemocyte samples from all oysters were examined. Oysters showing signs of contamination or exhibiting abnormal hemocyte profiles were excluded from the study.

      These measures ensured that the oysters used in this work were of high health status and minimized the likelihood of bacterial or viral contamination, including Vibrio and OsHV-1 μVar.

      (3) The KEGG and Gene Ontology analyses, while informative, are very descriptive and lack interpretation. The use of heatmaps with dendrograms for grouping cell clusters and GO terms is not discussed in the results, missing an opportunity to explore cell-type relationships. The changing order of cell clusters across panels B, C, and D in Figure 2 makes it challenging to correlate with panel A and to compare across different GO term categories. The dendrograms suggest proximity between certain clusters (e.g., 4 and 1) across different GO term types, implying similarity in cell processes, but this is not discussed. Grouping GO terms as in Figure 2A, rather than by dendrogram, might provide a clearer visualization of main pathways. Lastly, a more integrated discussion linking GO term and KEGG pathway analyses could offer a more comprehensive view of cell type characteristics. The presentation of scRNAseq results lacks depth in interpretation, particularly regarding the potential roles of different cell types based on their transcriptional profiles and marker genes. Additionally, some figures (2B, C, D, and 7C to H) suffer from information overload and small size, further hampering readability and interpretation.

      Thank you for your valuable suggestions regarding the presentation and interpretation of our KEGG and Gene Ontology (GO) analyses. In response, we revised Figure 2 to enhance clarity and provide deeper insights into cell-type relationships and biological processes.

      The revised figure 2 reorganizes GO term analysis into a more intuitive layout, grouping related biological processes and pathways in a structured manner. This approach replaces the dendrogram organization and provides a clearer visualization of key pathways for each cell cluster.

      (4) The pseudotime analysis presented in the study provides modest additional information to what is already manifest from the clustering and UMAP visualization. The central and intermediate transcriptomic profile of cluster 4 relative to other clusters is apparent from the UMAP and the expression of shared marker genes across clusters (as shown in Figure 1D). The statement by the authors that 'the two types of professional phagocytes belong to the same granular cell lineage' (lines 594-596) should be formulated with more caution. While the pseudotime trajectory links macrophage-like (ML) and small granule-like (SGC) cells, this doesn't definitively establish a direct lineage relationship. Such trajectories can result from similarities in gene expression induced by factors other than lineage relationships, such as responses to environmental stimuli or cell cycle states. To conclusively establish this lineage relationship, additional experiments like cell lineage tracing would be necessary, if such tools are available for C. gigas.

      We appreciate the reviewer’s detailed feedback on the pseudotime analysis and its interpretation. While we acknowledge that the clustering and UMAP visualization provide valuable insights, the pseudotime analysis offers a complementary approach by highlighting significantly expressed genes, including key transcription factors, that might otherwise be overlooked in differential expression analysis based solely on Log2FC between clusters. In our study, the pseudotime analysis revealed transcription factors known to play crucial roles in hemocyte differentiation, providing additional depth to our understanding of hemocyte lineage relationships and functional specialization.

      Regarding the statement on lines 594 - 596, we agree that the evidence provided by pseudotime trajectories does not definitively establish a direct lineage relationship between macrophage-like (ML) and small granule-like (SGC) cells. Instead, these trajectories suggest potential developmental connections that warrant further investigation. We propose the following revised sentence (lines 616 to 618) :

      "The pseudotime trajectory linking macrophage-like (ML) and small granule-like (SGC) cells suggests a potential developmental relationship within the granular cell lineage; however, this hypothesis requires further validation."

      We also concur with the reviewer that additional experiments, such as cell lineage tracing, would be necessary to definitively establish this relationship. Unfortunately, the long-term cultivation of hemocytes in C. gigas is currently not feasible. However, we are planning to develop FACS-based approaches to separate the seven hemocyte subtypes, which will allow us to refine their ontology and explore their potential lineage relationships more precisely.

      (6) Given the mention of herpesvirus as a major oyster pathogen, the lack of discussion on genes associated with antiviral immunity is a notable omission. While KEGG pathway analysis associated herpesvirus with cluster 1, the specific genes involved are not elaborated upon.

      Thank you for your valuable observation regarding the lack of discussion on genes associated with antiviral immunity, particularly in the context of herpes virus infection. The KEGG pathway analysis indeed identified a weak signature associated with herpesvirus in Cluster 1, primarily involving genes encoding beta integrins. In humans, beta integrins have been described as receptors facilitating herpesvirus entry (1). However, in the case of naive oysters used in this study, the KEGG signature was subtle, likely reflecting the absence of active viral infection. Additionally, beta integrins are multifunctional molecules that also play critical roles in processes such as cell adhesion, a function attributed to hyalinocytes, as highlighted in our results.

      Given the naive status of the oysters and the weak antiviral signature observed, we chose not to discuss these findings in detail in this study. However, ongoing work in our laboratory aims to further investigate the specific hemocyte populations targeted by OsHV-1, which may shed light on the role of integrins in antiviral immunity in oysters.

      We hope this clarifies our approach and the context of the KEGG findings. Thank you for bringing this important perspective to our attention.

      (7) The discussion misses an opportunity for comparative analysis with related species. Specifically, a comparison of gene markers and cell populations with Crassostrea hongkongensis, could highlight similarities and differences across systems.

      In response to the reviewer’s comment, we have added a comparative analysis between C. hongkongensis and C. gigas hemocyte populations, situating our findings within the broader context of invertebrate immune cell diversity and specialization (lines 747 to 760)

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 92-93: The authors should add references associated with transcriptomic studies of C. gigas hemocytes.

      Thank you for pointing this out. In the revised manuscript, we have added references to previous transcriptomic studies of C. gigas hemocytes (line 83).

      (2) Line 121 and 127: The authors should clarify whether 3,000 represents the number of cells loaded or their target for analysis.

      The number of cells processed was optimized to minimize the occurrence of doublets during scRNAseq. Following 10x Genomics Chromium guidelines, we loaded 4,950 cells to successfully recover a target of 3,000 cells, with a doublet rate of 2.4%, well below the target threshold of 2.5%. This information has been added on line 125 of the document. The target was 3,000 cells, and as reported in Supplementary Table S1, the estimated number of cells after STAR-solo alignment was 2,937. This ensures the reliability and accuracy of single-cell transcriptomic data.

      (3) Line 129: "Supp. Table 1" in the text and "Supp. Table S1" in the figure title should be edited.

      The inconsistency between "Supp. Table 1" in the text and "Supp. Table S1" in the figure title has been corrected for uniformity throughout the manuscript (line 134).

      (4) Line 138-139: The authors should clarify that the heatmap displays the top 10 positively enriched marker genes for each cluster, as identified by Seurat's differential expression analysis. It is important to note that the analysis does not explicitly show under-represented transcripts, but rather highlights the contrast between cluster-specific overexpressed genes and their lower expression in other clusters.

      We have clarified that the heatmap displays the top 10 positively enriched marker genes for each cluster, as identified by Seurat's differential expression analysis, and that the analysis highlights cluster-specific overexpressed genes rather than explicitly showing under-represented transcripts (lines 143 - 145).

      (5) Figure 1: The authors should consider improving or potentially removing Figure 1C. The gene IDs are not readable due to their small size, which significantly reduces the informative value of the figure. In addition, the data presented in this heatmap is largely redundant with the more informative and readable dot plot in Figure 1D, which shows both expression levels and the percentage of cells expressing each gene.

      Thank you for your suggestion regarding Figure 1C. In the revised manuscript, we have removed the original panel C from the main figure and transferred it to Supplementary Figure S1K, which improves readability while retaining the relevant data. We have also renumbered the remaining panels for clarity, with the former panel D now designated as panel C. We believe these adjustments address the reviewer’s concerns and streamline the presentation of the data.

      (6) Table 1: The authors should clarify in the legend the statistical significance criteria (adjusted p-value) for the genes listed.

      As requested, we have added the adjusted p-value threshold (adj. p-value < 0.05) to the legend of Table 1.

      (7) Line 188: The authors should align the text description of the KEGG pathways in cluster 7 with Figure 2A, describing Wnt signaling pathway and clarifying the terminology "endosome pathway" to ensure consistency.

      In the revised text, we have aligned our description with Figure 2A by explicitly mentioning the Wnt signaling pathway in cluster 7 (lines 193 to 194).

      The endo-lysosomal pathway encompasses a series of membrane-bound compartments and trafficking events responsible for the uptake of macromolecules from the extracellular environment, their subsequent sorting in endosomes, and eventual degradation in lysosomes. This pathway is tightly regulated, ensuring not only the breakdown of macromolecules but also the recycling of membrane components and signaling receptors essential for maintaining cellular homeostasis (2). In our study, the KEGG signatures of cluster 7 highlight the involvement of the endo-lysosomal pathway.

      (8) Line 223: The authors should revise the description of cluster 1, avoiding references to morphology at this point in the manuscript, as no morphological data has been presented yet.

      We have revised the description of Cluster 1 to remove references to morphology, ensuring consistency with the data presented at this stage of the manuscript (lines 227 to 229) : ”Cluster 1, comprising 27.6 % of cells, is characterized by GO-terms related to myosin complex, lamellipodium, membrane and actin cytoskeleton remodelling, as well as phosphotransferase activity.”

      (9) Figure 2: The authors should revise Figure 2 to improve the clarity. For Figure 2A, they should address the redundancy in the "Global and overview maps" category by removing overlapping pathways such as carbon metabolism and biosynthesis of amino acids, which are likely represented in more specific metabolic categories (glycolysis, pentose). They could consider grouping similar pathways together, such as combining "Amino acid metabolism" with "Metabolism of other amino acids," and separating metabolic pathways from cellular processes for easier interpretation. They should also address the surprising absence of certain expected pathways like lipid metabolism, nucleotide metabolism, and cofactor/vitamin metabolism, as well as cellular processes such as cell growth and chromatin modeling. Even if these pathways are not enriched in specific clusters, mentioning their absence could provide valuable context for the reader.

      In the revised version of the manuscript, we propose a new representation to facilitate understanding and improve the clarity of the data presentation.

      (10) For Figures 2B, C, and D, the authors should significantly increase the font size of text and numbers, ensuring readability at 100% scale in PDF format. They could also add labels directly on each graph to clearly indicate the type of GO terms represented, (Biological Process, Cellular Component, or Molecular Function).

      In the revised version of the manuscript, we propose a new representation to facilitate understanding and improve the clarity of the data presentation.

      (11) Line 247-250: The authors should revise their description of cell types to follow the same order as presented in Figure 3A.

      We have revised the description of cell types in the manuscript to follow the same order as presented in Figure 3A, as requested.

      (12) Line 265-266: The authors should develop the significance of the nucleo-cytoplasmic ratio in hemocyte morphology and identification.

      We thank the editor for bringing this to our attention and apologize for the discrepancy between the terminology used in the text and the results presented in Figure 3. The text refers to the nuclear-tocytoplasmic ratio (N/C), while the figure mistakenly displays the inverse ratio, cytoplasmic-to-nuclear ratio (C/N). We recognize that this inversion may cause confusion and will ensure consistency between the text and the figure.

      To address this, we propose correcting the figure legend and labels in Figure 3 to align with the terminology used in the text (N/C ratio). This will prevent confusion and maintain clarity throughout the manuscript.

      The nuclear-to-cytoplasmic (N:C) ratio, also known as the nucleus:cytoplasm ratio or N/C ratio, is a well-established measurement in cell biology that reflects the relative size of the nucleus to the cytoplasm. This ratio is frequently used as a morphologic feature in the diagnosis of atypia and malignancy in human cells, underscoring its diagnostic value. In the context of our study, we use the N:C ratio to provide a more precise and quantitative description of hemocyte types in Crassostrea gigas. Specifically, the N:C ratio allows us to distinguish between different hemocyte morphotypes, such as blasts and granular cells, and to enrich the characterization of their functional specialization. This quantitative measure supports the morphological classification and enhances the reproducibility and clarity of hemocyte identification.

      (13) Line 286-294: The authors should review and correct the legend for Figure 3. It seems that the description of results related to Figure 3C has been mistakenly inserted into the legend.

      We thank the reviewer for pointing out this issue with the legend of Figure 3. The description of results related to Figure 3C has now been removed from the legend. The revised legend focuses solely on the figure elements, improving clarity and consistency. We believe this adjustment addresses the reviewer's comment effectively.

      (14) Figure 3: The authors should revise the legend for Figure 3A to provide more detailed and explicit descriptions of the "Size, shape and particularities" of the ML, SGC, BGC, and VC hemocyte types.

      We thank the reviewer for their insightful suggestion to provide more explicit descriptions in the legend for Figure 3A. We have revised the legend to include detailed explanations of the "Size, shape, and particularities" for the ML, SGC, BGC, and VC hemocyte types. Specifically, we have clarified that size refers to the average granule diameter, shape describes the morphology of the granules (e.g., spherical or elongated), and particularities highlight distinguishing features such as granule color or fluorescence properties observed under specific staining or imaging conditions. We believe this updated legend provides the level of detail requested and enhances the clarity of the figure (lines 294 - 297).

      (15) Figure 4: The authors should clarify the method used for calculating relative gene expression in Figure 4A and Figure 6. They should explicitly state in the figure legend that the expression was normalized to the Cg-rps6 reference gene, as mentioned in line 835. The authors should also provide details on the calculation method used (e.g., 2-ΔCt method) and confirm whether the reference gene was expressed at similar levels across all clusters.

      We thank the reviewer for pointing out the need for additional clarity regarding the calculation of relative gene expression in Figures 4A and 6. To address this, we have revised the legends for both figures to explicitly state that gene expression levels were normalized to the reference gene Cg-rps6 and calculated using the 2^-ΔCt method. We have also confirmed that Cg-rps6 was stably expressed across all hemocyte clusters and explicitly mentioned this in the revised legends. These changes ensure greater transparency and address the reviewer’s concerns (lines 342 to 346).

      (16) The authors could consider removing or modifying Figure 4B, as it appears to be redundant with Figure 3C. Both figures show the average percentage of each hemocyte type in the seven Percoll gradient fractions.

      We thank the reviewer for highlighting potential redundancy between Figures 3C and 4B. While both figures present the distribution of hemocyte types across Percoll gradient fractions, Figure 4B serves a distinct and critical purpose in the manuscript. Specifically, it provides the numerical data necessary to understand the correlations shown in Figure 4A, where we analyze the relationship between gene expression levels and the distribution of hemocyte types. These detailed percentages are essential for interpreting the statistical robustness and biological relevance of the correlation matrix, which could not be derived solely from the qualitative visualization in Figure 3C.

      (17) Figure 5: The authors should address the redundancy between Figure S7B and Figure 5B, as they appear to present the same data. In Figure S7B, "SGC" is incorrectly abbreviated as "G".

      In the revised version of the manuscript, we addressed the redundancy between the two figures and we corrected the incorrectly abbreviated SGC.

      (18) Line 412: The authors should correct the typographical error, changing "Pecoll" to "Percoll".

      In the revised version of the manuscript, we correct this typographical error (line 417).

      (19) Line 417: The statement about the inhibitor apocynin likely refers to Figure 5D, not Figure 5C.

      In the revised version of the manuscript, we have corrected this reference error to accurately refer to Figure 5D (line 422).

      (20) Line 441-444: The authors should provide references to support their annotation of cluster 1 as macrophage-like cells based on macrophage-specific genes. These references should cite established literature on known macrophage gene markers, particularly in bivalves or related species if available. They need to clarify whether specific gene markers exist for each of the hemocyte morphotypes they have identified. If such markers are known from previous studies, they should be mentioned and referenced.

      We propose to modify lines 446 to 449 to address the reviewer's concerns. Cluster 1, which we have termed "macrophage-like" due to its pronounced phagocytic activity and reactive oxygen species (ROS) production, is enriched in Angiopoietin-1 receptor expression (Table 1). Angiopoietin receptors belong to the Tie receptor family, which is expressed in a subset of macrophages known as Tie2-expressing monocytes (TEMs) in humans (35). While our analysis reveals a strong overexpression of the Angiopoietin-1 receptor, we acknowledge that this receptor is not an exclusive marker for macrophages.

      In bivalves, including oysters, no definitive molecular markers have been established for macrophagelike cells as they are defined functionally in this study. Consequently, the identification of such cells relies on their functional characteristics rather than strict marker expression. To clarify, we propose the following revision to the sentence:

      Furthermore, this cluster expresses macrophage-related genes, including the macrophage-expressed gene 1 protein (G30226) (Supp. Data S1), along with maturation factors for dual oxidase, an enzyme involved in peroxide formation (Supp. Fig. S8), supporting its designation as macrophage-like based on functional characteristics.

      (21) Figure 7: For Figures 7C to 7H, the authors should increase the font size of gene names and descriptions to ensure legibility in both printed versions and digital formats. To simplify these figures, the authors could consider displaying less differentially expressed genes for each lineage, along with the top genes for each differentiation pathway. If detailed gene information is crucial, they could move the full list to a supplementary table and reference it in the figure legend. Regarding Figure 7I, the authors should reorder the transcription factor genes by cluster and specificity to improve visualization and interpretation, like in Figure 1D.

      Thank you for these valuable suggestions regarding Figure 7. We have revised Figures 7C–H to ensure improved readability. Furthermore, we have simplified these panels by highlighting fewer differentially expressed genes for each lineage. In Figure 7I, we have reordered the transcription factor genes by cluster and specificity, following a layout similar to Figure 1D, to facilitate clearer visualization and interpretation of the data.

      (22) Line 490: The authors should provide more precise references to the specific GO terms and figure panels they are discussing.

      To address this comment, we have revised the sentence and provided additional information in the text to clearly indicate where the corresponding figure panels can be found in the manuscript (line 499)

      (23) Line 510: The authors state that "5 cell lineages could be defined," but the subsequent text and Figure 7C to H actually present 6 distinct lineages.

      We have corrected in the manuscript. 6 lineages could be defined (line 521).

      (24) Line 534: The authors should consider further investigating the pluripotent potential of cluster 4 cells by exploring known or potential stem cell markers in their scRNAseq data.

      Thank you for highlighting the possibility of pluripotent potential of cluster 4. In our current analysis, we did not detect any known stem cell or proliferative markers, nor evidence of a clearly defined hematopoiesis site in the hemolymph. Indeed, previous work suggests that oyster hematopoiesis may occur in tissues such as the gills, implying that stem or progenitor cells might not circulate in the hemolymph under homeostatic conditions. Consequently, it is plausible that our observation of no proliferative cell populations partly reflects their absence in hemolymph, especially in naïve (unstimulated) oysters. To conclusively identify potential progenitor cells and their proliferative activity, further approaches involving deliberate perturbation of hemocyte homeostasis - such as immunological challenge (e.g., Zymosan treatment) combined with lineage-tracing or proliferation assays - would be necessary. These future investigations would not only clarify whether proliferative cells emerge in the hemolymph in response to environmental or pathological stimuli but also help elucidate the broader cellular pathways underlying oyster immune responses.

      In response to the reviewer’s comment, we have revised the Discussion (lines 695 to 696) and added : “Nevertheless, we did not detect any canonical stem or progenitor cell populations in our dataset, underscoring the need for future investigations - potentially involving immunological challenges and lineage-tracing assays - to clarify whether proliferative cells circulate in the hemolymph or instead reside primarily in tissue compartments.”

      (25) Figure S10: The authors should significantly improve the readability of Figure S10 by increasing the font size. Currently, the small font size makes it impossible for readers to discern the information presented.

      Thank you for highlighting the readability concerns regarding Figure S10. In response to your comment, we have increased the overall size and font of the figure, ensuring that all labels and legends are clearly legible in both printed and digital formats. We believe these adjustments will allow readers to more easily interpret the information presented.

      (26) Line 896: The authors should correct the typographical error on line 896 by deleting the additional bracket.

      In the revised version of the manuscript, we correct this typographical error.

      (27) Figure S12: The authors should address the absence of any reference to Figure S12 in the main text of the manuscript.

      The reference to Supp. Figure S12 has been corrected. It was a referencing error between Supp. Figure S11(in the discussion, line 670) and Supp. Figure S12.

      Bibliography:

      (1) G. Campadelli-Fiume, D. Collins-McMillen, T. Gianni, A. D. Yurochko, Integrins as Herpesvirus Receptors and Mediators of the Host Signalosome. Annual Review of Virology 3, 215–236 (2016).

      (2) J. P. Luzio, P. R. Pryor, N. A. Bright, Lysosomes: fusion and function. Nat Rev Mol Cell Biol 8, 622–632 (2007).

      (3) A. S. Harney, E. N. Arwert, D. Entenberg, Y. Wang, P. Guo, B.-Z. Qian, M. H. Oktay, J. W. Pollard, J. G. Jones, J. S. Condeelis, Real-Time Imaging Reveals Local, Transient Vascular Permeability, and Tumor Cell Intravasation Stimulated by TIE2hi Macrophage-Derived VEGFA. Cancer Discov 5, 932–943 (2015).

      (4) M. De Palma, R. Mazzieri, L. S. Politi, F. Pucci, E. Zonari, G. Sitia, S. Mazzoleni, D. Moi, M. A. Venneri, S. Indraccolo, A. Falini, L. G. Guidotti, R. Galli, L. Naldini, Tumor-targeted interferon-alpha delivery by Tie2-expressing monocytes inhibits tumor growth and metastasis. Cancer Cell 14, 299–311 (2008).

      (5) M. De Palma, M. A. Venneri, R. Galli, L. Sergi Sergi, L. S. Politi, M. Sampaolesi, L. Naldini, Tie2 identifies a hematopoietic lineage of proangiogenic monocytes required for tumor vessel formation and a mesenchymal population of pericyte progenitors. Cancer Cell 8, 211–226 (2005).

      Reviewer #3 (Public review):

      The paper addresses pivotal questions concerning the multifaceted functions of oyster hemocytes by integrating single-cell RNA sequencing (scRNA-seq) data with analyses of cell morphology, transcriptional profiles, and immune functions. In addition to investigating granulocyte cells, the study delves into the potential roles of blast and hyalinocyte cells. A key discovery highlighted in this research is the identification of cell types engaged in antimicrobial activities, encompassing processes such as phagocytosis, intracellular copper accumulation, oxidative bursts, and antimicrobial peptide synthesis.

      A particularly intriguing aspect of the study lies in the exploration of hemocyte lineages, warranting further investigation, such as employing scRNA-seq on embryos at various developmental stages.

      In the opinion of this reviewer, the discussion should compare and contrast the transcriptome characteristics of hemocytes, particularly granule cells, across the three species of bivalves, aligning with the published scRNA-seq studies in this field to elucidate the uniformities and variances in bivalve hemocytes.

      Reviewer #3 (Recommendations for the authors):

      Minor Concerns:

      (1) In the context of C. gigas, the notable expansion of stress and immune-related genes in its genome stands out. It is anticipated that the article will discuss the expression patterns of classical immune-related genes like TLR and RLR across different cell clusters.

      We appreciate the reviewer's interest in the expression patterns of classical immune-related genes, such as Toll-like receptors (TLRs) and RIG-I-like receptors (RLRs), across different cell clusters in Crassostrea gigas. In our single-cell RNA sequencing (scRNA-seq) analysis, we did not detect significant expression of TLR or RLR genes. This absence can be attributed to several factors. First, technical limitations of scRNA-seq: The droplet-based scRNA-seq technology employed in our study captures only a fraction of the transcripts present in each cell approximately 10–20% (https://kb.10xgenomics.com/hc/en-us/articles/360001539051-What-fraction-of-mRNA-transcriptsare-captured-per-cell). This inherent limitation often results in the underrepresentation of genes with low expression levels. Consequently, TLRs and RLRs, which may be expressed at low levels in certain hemocytes, could be undetected due to this capture inefficiency. TLRs are typically expressed at low basal levels under resting conditions and are upregulated in response to specific stimuli or pathogenic challenges (1, 2). Given that our study analyzed hemocytes in their basal state, the expression levels of these receptors may have been below the detection threshold of the scRNA-seq platform. Furthermore, as highlighted by De Lorgeril et al. (3) the expression of these immune receptors varies depending on the resistance of the oyster. This variability further underscores the dynamic and context-dependent nature of TLR and RLR expression

      To comprehensively assess the expression patterns of TLRs and RLRs across different hemocyte clusters, future studies could incorporate targeted enrichment strategies, such as bulk RNA-seq or single-cell technologies with higher capture efficiencies. Additionally, analyzing hemocytes under stimulated conditions or comparing oysters with varying levels of resistance could provide insights into the inducible and context-specific expression of these immune receptors.

      (2) Clarification is needed in lines 265-266 regarding the nucleo-cytoplasmic ratio (N/C) terminology to prevent confusion, considering the discrepancy with the results presented in Figure 3.

      We thank the editor for bringing this to our attention and apologize for the discrepancy between the terminology used in the text and the results presented in Figure 3. The text refers to the nuclear-tocytoplasmic ratio (N/C), while the figure mistakenly displays the inverse ratio, cytoplasmic-to-nuclear ratio (C/N). We recognize that this inversion may cause confusion and will ensure consistency between the text and the figure.

      To address this, we propose correcting the figure legend and labels in Figure 3 to align with the terminology used in the text (N/C ratio). This will prevent confusion and maintain clarity throughout the manuscript.

      (3) The selection of cluster 4 as the root for pseudotime analysis based on high ribosomal protein expression raises questions. It would be beneficial to elaborate on the inclusion of other genes, such as cell cycle or mitotic-related genes, to validate the pseudotime analysis outcomes.

      We appreciate the reviewer’s insightful comment on the significance of ribosomal proteins in stem cell maintenance.

      Hematopoietic stem cells (HSCs) are a population of stem cells that are largely cell-cycle-quiescent (G0 phase) with low biosynthetic activity. Upon stimulation and stress HScs undergo proliferation and differentiation and produce all lineages of hemocytes.

      Ribosomal proteins play a multifaceted role in preserving the balance between stem cell quiescence and activation. By ensuring precise regulation of protein synthesis, they allow stem cells to maintain their undifferentiated state while remaining poised for activation when needed. Furthermore, ribosomal proteins contribute to the cellular stress response, safeguarding stem cells from oxidative damage and other stressors that could compromise their functionality. Importantly, ribosomal biogenesis and the dynamic assembly of ribosomes provide a regulatory mechanism that fine-tunes the transition from self-renewal to differentiation, a critical feature of hematopoietic stem cells (HSCs) and other stem cell types. These mechanisms collectively highlight the indispensable role of ribosomal proteins in stem cell biology, underscoring their relevance to our study's findings.

      In vertebrate, the maintenance of hematopoietic stem cells (HSCs) and hematopoietic homeostasis is widely acknowledged to rely on the proper regulation of ribosome function and protein synthesis (4). This process necessitates the coordinated expression of numerous genes, including genes that encode ribosomal proteins (RP genes) and those involved in regulating ribosome biogenesis and protein translation. Disruptions or mutations in these critical genes are associated with the development of congenital disorders (5). Among these, Rpl22 (found in cluster 4 with a Log2FC of 1.59) has been shown to play a pivotal role in HSC maintenance by balancing ribosomal protein paralog activity, which is critical for the emergence and function of HSCs (6).

      (4) What is the resolution of the cell clustering employed in the study? Given that cluster 1 potentially encompasses two distinct cell types, Macrophage-Like and Big Granule cells, further sub-clustering efforts and correlation analyses between cluster markers and cell morphologies could aid in their differentiation.

      Thank you for your inquiry regarding the resolution of our cell clustering. As described in the Materials and Methods section, we used the Seurat FindClusters function with a resolution parameter of r = 0.1 for the scRNA-seq dataset. We performed sub-clustering within Cluster 1, resulting in four distinct subclusters. However, despite analyzing various specific markers, we did not identify any marker uniquely associated with the Big Granule Cell (BGC) morphology. Notably, LACC24 specifically marks a subset of cells within Cluster 1, as shown in Supplementary Figure S8, although this gene alone was insufficient to definitively distinguish a distinct BGC population.

      (5) Line 78's statement regarding the primary identification of three hemocyte cell types in C. gigas-blast, hyalinocyte, and granulocyte cells would benefit from including references to substantiate this claim.

      We thank Reviewer #1 for their valuable comments, which have allowed us to further improve our manuscript. We have enriched the introduction with the following addition (lines 79 to 82):

      “Blast-like cells are considered undifferentiated hemocyte types (Donaghy et al., 2010), hyalinocytes appear to play a key role in wound repair (de la Ballina et al., 2020), and granulocytes are primarily involved in immune surveillance. Among these, granulocytes are regarded as the main immunocompetent hemocyte type (Wang et al., 2017).”

      Conclusion:

      The authors largely achieved their primary objective of providing a comprehensive characterization of oyster immune cells. They successfully integrated multiple approaches to identify and describe distinct hemocyte types. The correlation of these cell types with specific immune functions represents a significant advancement in understanding oyster immunity. However, certain aspects of their objectives have not been fully achieved. The lineage relationships proposed on the basis of pseudotime analysis, while interesting, require further experimental validation. The potential of antiviral defense mechanisms, an important aspect of oyster immunity, has not been discussed in depth.

      This study is likely to have a significant impact on the field of invertebrate immunology, particularly in bivalve research. It provides a new standard for comprehensive immune cell characterization in invertebrates. The identification of specific markers for different hemocyte types will facilitate future research on oyster immunity. The proposed model of hemocyte lineages, while requiring further validation, offers a framework for studying hematopoiesis in bivalves.

      Bibliography:

      (1) J. Chen, J. Lin, F. Yu, Z. Zhong, Q. Liang, H. Pang, S. Wu, Transcriptome analysis reveals the function of TLR4-MyD88 pathway in immune response of Crassostrea hongkongensis against Vibrio Parahemolyticus. Aquaculture Reports 25, 101253 (2022).

      (2) Y. Zhang, X. He, F. Yu, Z. Xiang, J. Li, K. L. Thorpe, Z. Yu, Characteristic and Functional Analysis of Toll-like Receptors (TLRs) in the lophotrocozoan, Crassostrea gigas, Reveals Ancient Origin of TLR-Mediated Innate Immunity. PLOS ONE 8, e76464 (2013).

      (3) J. de Lorgeril, B. Petton, A. Lucasson, V. Perez, P.-L. Stenger, L. Dégremont, C. Montagnani, J.M. Escoubas, P. Haffner, J.-F. Allienne, M. Leroy, F. Lagarde, J. Vidal-Dupiol, Y. Gueguen, G.

      Mitta, Differential basal expression of immune genes confers Crassostrea gigas resistance to Pacific oyster mortality syndrome. BMC Genomics 21, 63 (2020).

      (4) R. A. J. Signer, J. A. Magee, A. Salic, S. J. Morrison, Haematopoietic stem cells require a highly regulated protein synthesis rate. Nature 509, 49–54 (2014).

      (5) A. Narla, B. L. Ebert, Ribosomopathies: human disorders of ribosome dysfunction. Blood 115, 3196–3205 (2010).

      (6) Y. Zhang, A.-C. E. Duc, S. Rao, X.-L. Sun, A. N. Bilbee, M. Rhodes, Q. Li, D. J. Kappes, J. Rhodes, D. L. Wiest, Control of Hematopoietic Stem Cell Emergence by Antagonistic Functions of Ribosomal Protein Paralogs. Developmental Cell 24, 411–425 (2013).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This paper provides a compelling analysis of chiton genomes, revealing extensive genomic rearrangements despite the group's apparent morphological stasis. By examining five reference-quality genomes, the study identifies 20 conserved molluscan linkage groups that are subject to significant rearrangements, fusions, and duplications in chitons, particularly in the basal Lepidopleurida clade. The high heterozygosity observed adds complexity to genome assembly but also highlights notable genetic diversity.

      We also note the comment from this reviewer that “more information is needed to clarify how this affects genome assembly and evolutionary outcomes.” We strongly agree; although it is outside the scope of this study, this may help develop future work on that topic.

      The research challenges the assumption that morphological stability implies genomic conservatism, suggesting that dynamic genome structures may play a role in species diversification. Although limited by the small number of molluscan genomes available for comparison, this study offers valuable insights into evolutionary processes and calls for further genomic exploration across molluscan clades. Some minor comments need to be tackled:

      (1) Line 39: 'major changes'. Please, better explain what you mean here?

      Clarified as major morphological change

      (2) Lines 70-73: refer to 'extant' cephalopods.

      Corrected

      (3) There is an inconsistency in the use of "Callochitonida" (lines 76, 85, 140, 145, Table S3, Figure S3) and "Chitonida s.l." (Figures 2, 3, and 4) throughout the text, figures, and supplementary material. To maintain clarity and avoid confusion, I recommend choosing one taxon and using it consistently across all sections of the manuscript. This will ensure coherence and help readers follow the discussion without ambiguity.

      An explanation has been added to the introduction and other instances in the text changed to Chitonida s.l. for consistency

      (4) Overall, the conclusions introduce several important topics and additional information that were not addressed earlier in the paper. It would enhance the coherence and impact of the study to introduce these points in the introduction, as they highlight the broader significance and relevance of the research. Integrating these key aspects earlier on would better frame the study's objectives and provide readers with a clearer understanding of its importance from the outset.

      The paragraph about chiton natural history and some additional lines have been moved to the introduction

      (5) Lines 242-245 and 254-256: While I agree with the authors on the remarkable results found in molluscs, particularly in polyplacophorans, I suggest toning down the comparisons with lepidopterans. The current framing may come across as dismissive towards butterflies, which does not seem necessary. It's true that biases exist in studying taxa that are more charismatic due to factors like diversity or aesthetic appeal, but the goal should be to emphasize the value of polyplacophorans without downplaying the significance of butterfly research. Instead, the focus should be on highlighting chitons as an exciting new model for understanding key evolutionary processes like synteny, polyploidy, and genome evolution. This shift would underscore the importance of polyplacophorans in a positive light without diminishing the value of lepidopteran studies.

      This sentence has been rephrased to adjust the tone of this paragraph

      (6) Figure 3: should be read 'Polyplacophora'.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      I hope these comments by line number are helpful, despite my lack of experience with comparative genomics:

      We note the general comment from this reviewer that “most chiton genomes seem to be relatively conserved” may be  a misunderstanding from our presentation; we have added some additional notes in the first part of the discussion to ensure that this is clear to all readers.

      The reviewer also pointed out that “geologically recent events that do not especially represent the general pattern of genome evolution across this ancient molluscan taxon”. To clarify, the (limited) phylogenetic evidence suggests these changes are a longer term pattern throughout chiton evolution, since chromosomal rearrangements are found when comparing congeneric species (Acanthochitona spp., Fig 4C) and also across orders (Fig 4B). This has been added to the conclusions, as this is clearly an important point that was not adequately explained in the original text.

      (1) Line 72: It is true that adaptive radiations occur and are an interesting general model for how diversification can lead to species-rich taxa. However, there are other "non-adaptive" processes that can lead to geographically isolated species that are not much differentiated in their ecological or morphological diversity. The sentence here implies that such adaptive radiation is a necessary correlation of species richness. I agree that chitons have hardly frozen in time since the Paleozoic.

      This is clarified by moving some additional natural history aspects of chitons to the introduction, also as suggested by the first reviewer

      (2) L113: I am curious about how this character optimization was accomplished to allow the authors to reconstruct the HAM (hypothetical ancestral mollusc) chromosome number as 20 when the range of variation in Polyplacophora is 6 to 16 (mode 11), and chitons are part of the sister taxon to conchiferans. Is this dependent on the chromosome numbers found in the outgroup?

      We inferred ancestral linkage groups (“chromosomes”) based on comparison with other gastropods and bivalves noted in the methods; the other study cited (Simakov et al. 2022) used a broader selection of metazoans and also predicted an ancestral Mollusca karyotype of 1N=20.

      (3) L116: "Using five chromosome-level genome assemblies for chitons, we reconstructed the ancestral karyotype for Polyplacophora (more strictly the taxonomic order Neoloricata), and all intermediate phylogenetic nodes to demonstrate the stepwise fusion and rearrangement of gene linkage groups during chiton evolution (Fig. 3)."

      This is probably fine, but I had to struggle to understand what genome events happened between the Acanthochitona species. Are the chromosomes merely ordered and numbered by chromosome size and the switch in position between chromosomes 1 and 3 just has to do with the chromosomes 4+5, so they become the largest chromosome, and the former 1 is now 3? Confusing! The way it is drawn it seems like this implies more genome rearrangement than occurred, whereas if the order was maintained it would be more obvious that there were simply two chromosome fusions.

      The linkage groups are numbered in order of size, which is the typical way they would each be presented if the taxon was illustrated alone. Here this allows the reader to understand how the fusions or rearrangements have shifted the volume of genetic information between groups especially in comparison to the molluscan or polyplacophoran ancestor. In Fig 4 we instead decided to present the linkage groups in a revised form, so that each transition from the nearest ancestor is visible in more detail. We have added these points in the figure caption for Fig 3 which should make it easier for new readers to understand the presentation.

      (4) L481: Typo: A. rubrolineatain should be A. rubrolineata.

      Corrected

      (5) Figure 4: I am a little confused with what is meant by an "Ancestor" in these diagrams. For example, for comparing the two species of Acanthochitona with a hypothetical ancestor, it seems that the ancestor should be like one of the two, not different from both.

      I am looking at Ancestor "3" compared with the Acanthochitona rubrolineata "3" and A. discrepans "4". Again, I assume that the latter is "4" because it is slightly smaller than a new "3" and now the new "3" corresponds to "1" in the other Acanthochitona. This figure does help interpret Figure 3.

      To the point about reconstructing ancestral types; the two species both descended from a common ancestor. In morphology it is sometimes clear that one lineage retains more plesiomorphic character states; but in this case we must assume equal probability of change in any direction. The ancestor is a compromise that estimates the shortest distance to both descendants.

      We understand how the numbers were unclear and potentially distracting. This has been added to the figure caption, we are grateful for the feedback that will certainly help future readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates protein-protein interactions (PPIs) within the nuage, a germline-specific organelle essential for piRNA biogenesis in Drosophila melanogaster, using AlphaFold2 to predict interactions among 20 nuage-localizing proteins. The authors identify five novel interaction candidates and experimentally validate three of them, including Spindle-E and Squash, through co-immunoprecipitation assays. They confirm the functional significance of these interactions by disrupting salt bridges at the Spn-E_Squ interface. The study further expands its scope to analyze approximately 430 oogenesis-related proteins, validating three additional interaction pairs. A comprehensive screen of around 12,000 Drosophila proteins for interactions with the key piRNA pathway player, Piwi, identifies 164 potential binding partners. Overall, the research demonstrates that in silico approaches using AlphaFold2 can link bioinformatics predictions with experimental validation, streamlining the identification of novel protein interactions and reducing the reliance on extensive experimental efforts. The manuscript is commendably clear and easy to follow; however, areas for improvement should be addressed to enhance its clarity and rigor.

      Major Concerns:

      (1) While AlphaFold2 was developed and trained primarily for predicting protein structures and their interactions, applying it to predict protein-protein interactions is an extrapolation of its intended use. This introduces several important considerations and risks. First, it assumes that AlphaFold's accuracy in structure prediction extends to interactions, despite not being explicitly trained for this task. Additionally, the assumption that high-scoring models with structural complementarity imply biologically relevant interactions is not always valid. Experimental validation is essential to address these uncertainties, as over-reliance on computational predictions without such validation can lead to false positives and inaccurate conclusions. The authors should expand on the assumptions, limitations, and risks associated with using AlphaFold2 for predicting protein-protein interactions.

      We appreciate the reviewer's point. The prediction of protein-protein interactions using AlphaFold2 relies on the number of conserved homologous sequences and previous conformational data(8) (Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021)). We added sentences explaining the limitations and risks of the AlphaFold2 prediction method in Introduction and the end of Result and Discussion of the revised manuscript, respectively.

      Page 5, Line 67;

      “AlphaFold2 requires sequence homology information to predict protein-protein interactions and the complex structure model. The reliability of these predictions is basically dependent on the strength of co-evolutionary signals(9).”

      Page 6, Line 84;

      “AlphaFold2 was initially trained to predict the structure of individual proteins(8). Its application to complex prediction is an extrapolative use beyond its original intended scope, and its accuracy remains unverified. Even high-confidence predictions may not correspond to actual interactions, necessitating experimental validation to confirm whether predicted protein dimers truly bind.”

      Page 21, Line 361;

      “This study identifies several potential protein interactions, but AlphaFold2 predictions require caution. Protein-protein interactions involve conformational changes and dependencies on ligands, ions, and cofactors, which AlphaFold2 does not consider, potentially reducing prediction accuracy. Notably, the presence of a high-scoring model in terms of structural complementarity does not guarantee that the interaction is biologically significant.”

      (2) The authors experimentally validated three interactions, out of five predicted interactions, using co-immunoprecipitation (co-IP). They attributed the lack of validation for the other two predictions to the limitations of the co-IP method. However, further clarification on the potential limitations of the co-immunoprecipitation behind the negative results would strengthen the conclusions. While co-IP is a widely used technique, it may not detect weak or transient interactions, which could explain the failure to validate some predictions. Suggesting alternative validation methods such as FRET or mass spectrometry could further substantiate the results. On the other hand, AlphaFold2 predictions are not infallible and may generate false positives, particularly when dealing with structurally plausible but biologically irrelevant interactions. By acknowledging both the potential limitations of co-IP and the possibility of false positives from AlphaFold2, the authors can provide a more balanced interpretation of their findings.

      We appreciate the reviewer's point of view. We have used the co-IP method to detect interactions in this study. However, as the reviewer pointed out, it is likely that weak and transient interactions may not be detected. We added a note on the detection limits of the co-IP method and the possibility that AlphaFold2 method produces false positives in the revised manuscript.

      Page 12, Line 197;

      “While co-immunoprecipitation is a widely used method, it may not always detect weak or transient interactions. Other validation methods, such as FRET or co-localization assay in culture cells, could offer further insights to support the results. It is also important to note that AlphaFold2's predictions are not definitive and may lead to false positives, particularly when analyzing a large number of interactions.”

      (3) In line 143, the authors state that "This approach identified 13 pairs; seven of these were already known to form complexes, confirming the effectiveness of AlphaFold2 in predicting complex formations (Table 2). The highest pcScore pair was the Zuc homodimer, possibly because AlphaFold2 had learned from Zuc homodimer's crystal structure registered in the database." While the authors mentioned the presence of the Zuc homodimer's crystal structure, they do not provide a systematic bioinformatics analysis to evaluate pairwise sequence identity or check for the presence of existing structures for all the proteins or protein pairs (or their homologs) in databases such as the Protein Data Bank (PDB) or Swiss-Model. Conducting such an analysis is critical, as it significantly impacts the novelty and reliability of AlphaFold2 predictions. For instance, high sequence identity between the query proteins could lead to high-scoring models for biologically irrelevant interactions. Including this information would strengthen the conclusions regarding the accuracy and utility of the predictions.

      We appreciate the reviewer's critical point. The AlphaFold2 method generates a high confidence score when the 3D structure of the protein of interest, or of proteins with very similar sequences, is solved. We investigated whether the proteins used in this study are included in the 3D structure database (PDB) and added the information as a supplemental table S2. The following sentences were added to explain the structural references that AlphaFold2 has learned in the revised manuscript.

      Page 9, Line 150;

      The structures of the 20 proteins used in this study have been analyzed to varying extents in previous studies (Supplementary Table S2). A complex of Vas and the Lotus domain of Osk has been reported(20), and based on this complex structure, the interaction between Vas and Tej Lotus domain was predicted with a high score. Although the conformational analyses of the RNA helicase domain and the eTud domain have been reported previously, many of those cover only a subset of the regions and unlikely to affect our predictions in this study.

      The predicted 3D structures and the Predicted Aligned Error (PAE) plots for the 12 pairs, are shown in Fig. 1C.

      (4) While the manuscript successfully identifies novel protein interactions, the broader biological significance of these interactions remains underexplored. The manuscript could benefit from elaborating on how these findings may contribute to understanding the piRNA pathway and its implications on germline development, transposon repression, and oogenesis.

      We added to the revise manuscript the potential biological significance of the novel protein-protein interactions presented in this manuscript as follows;

      Page 16, Line 268;

      “In this study, three novel protein-protein interactions were predicted and experimentally confirmed. AlphaFold2 also predicted the 3D structure of these complexes, providing insight into the important regions involved in complex formation. These predictions will provide fundamental information to elucidate nuage assembly. Nuage is thought to form by liquid-phase separation; however, direct protein-protein interactions likely occur within protein-dense nuage, facilitating RNA processing. Although the precise roles of individual interactions require further study, characterization of protein-protein interactions within nuage will help clarify the mechanism of piRNA production.”

      Reviewer #1 (Recommendations for the authors):

      Minor Concerns:

      (1) In the Materials and Methods section, the authors thoroughly describe the computational infrastructure (SQUID at Osaka University) and the use of AlphaFold2. However, it would greatly benefit the readers to include a detailed breakdown of the computational cost. Understanding the computational cost (in terms of time, CPU/GPU hours, or other relevant metrics) for predicting 3D structures, especially for 400 protein pairs, would provide valuable insight into the efficiency and scalability of the approach. This would enhance the practical relevance of the methodology section and offer a better understanding of the resources required, beyond just the infrastructure description.

      Thank you for your valuable suggestion. The following descriptions were added in the revised manuscript.

      Page 24, Line 403;

      “The calculation of the MSA took on average 2-4 hours per protein, with the more homologs of the protein in query, the longer it took.”

      Page 24, Line 409;

      “Prediction of dimer structure took approximately 1-2 hours per pair on average, depending on protein size. Each user can compute 100~200 pairs of calculations per day, but since the supercomputer is shared, job availability varies with overall demand.”

      (2) The manuscript will benefit from a review for grammatical accuracy and clarity, especially in complex explanations. For example, in Line 160: "The predicted dimer structures of Me31B_Tral and Cup_Me31B showed the score of 0.74 and 0.68, respectively (Table 2)." could be revised to "The predicted dimer structures of Me31B_Tral and Cup_Me31B showed scores of 0.74 and 0.68, respectively.

      Thank you very much for pointing it out. Correction has been made to the text pointed out (Page 10, Line 170).

      (3) For alphafold3 webserver, please use (https://alphafoldserver.com/) instead of (https://golgi.sandbox.google.com/about).

      Thank you very much for pointing it out. The URL has been changed in the revised manuscript (Page 25, Line 422).

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use AlphaFold2 to identify potential binding partners of nuage localizing proteins.

      Strengths:

      The main strength of the paper is that the authors experimentally verify a subset of the predicted interactions.

      Many studies have been performed to predict protein-protein interactions in various subsets of proteins. The interesting story here is that the authors (i) focus on an organelle that contains quite some intrinsically disordered proteins and (ii) experimentally verify some (but not all) predictions.

      Weaknesses:

      Identification of pairwise interactions is only a first step towards understanding complex interactions. It is pretty clear from the predictions that some (but certainly not all) of the pairs could be used to build larger complexes. AlphaFold easily handles proteins up to 4-5000 residues, so this should be possible. I suggest that the authors do this to provide more biological insights.

      We thank the reviewer for his kind suggestions. In this study, protein dimers were screened on the assumption that the two proteins bind 1:1; in some cases, multiple binding partners were predicted for a single protein. For example, Spn-E was predicted to bind Tej and Squ, respectively. Therefore, for Spn-E_Squ_Tej, we used the latest AlphaFold3 to predict the trimeric structure, which has already been described in the first manuscript. In addition, as suggested by the reviewer, other possible trimer results were also added in the revised manuscript as follows;

      Page 15, Line 249;

      “In addition to the Spn-E_Squ_Tej complex, 1:1 dimer prediction described above further suggested potential trimers (Fig. 1; Supplemental Fig. S4). For example, Tej protein is predicted to bind both Vas and Spn-E, and AlfaFold3 indeed further predicted a Vas_Tej_Spn-E trimer, where Tej’s Lotus and eTud domains interact with Vas and Spn-E, respectively. However, Lin et al. reported that Tej binds exclusively either with Vas or Spn-E, but not simultaneously(17), in Drosophila ovary, suggesting that the predicted trimers may be weak or transient. Similarly, the BoYb_Vret_Shu and the Me31B_Cup_Tral trimers remain hypothetical and require experimental verification (Supplemental Fig. S4).”

      Another weakness is the use of a non-standard name for "ranking confidence" - the author calls it the pcScore - while the name used in AlphaFold (and many other publications) is ranking confidence.

      “pcScore” has been changed to “ranking confidence”

      Reviewer #2 (Recommendations for the authors):

      (1) The pcScore is actually what is called RankingConfidence. Also, many other measures have been developed by other groups (based on PAE for instance) - these could be compared.

      Thank you for your valuable suggestions. While other indicators are being developed, we have computed the affinity of the complex based on the predicted three-dimensional structure by using PRODIGY web server. The description was added in the revised manuscript as follows;

      Page 18, Line 300;

      “The ranking confidence score reflects the reliability of AlphaFold2's predicted structure but does not always ensure accuracy. Therefore, we assessed complex affinity based on the predicted three-dimensional structures (Supplemental Table S6). Most dimers with high ranking confidence scores exhibited low Kd values indicative of high affinity, while some showed high Kd values indicating weak interactions (Supplemental Table S6). For example, the Baf_Vas complex had a high AlphaFold2 ranking confidence score (0.85) but a relatively high Kd value (1.1E-4 M), indicating low affinity. Consistently, Baf_Vas binding was not detected in Co-IP experiments (Fig. S5C). Although accurate Kd prediction may be limited due to insufficient structural optimization, it could serve as a valuable secondary screening tool following AlphaFold2 predictions.”

      (2) A statistical estimate of FDR for binding to the PIWI protein needs to be estimated. It is possible that 1.6% of random proteins (from another species for instance) also obtain ranking confidence over 0.6, i.e. how trustful are the predictions?

      Thank you for the insightful comments. Unfortunately, it is difficult to infer the FDR from the value of ranking confidence. Presumably, the accuracy will vary depending on the target protein, since the number of homologs and known conformational information will differ. In the case of Piwi, the FDR is expected to be relatively low since the conformation of the protein on its own has been experimentally determined. However, even for Piwi complexes with high values of ranking confidence, the estimated affinity varied from high to low (Supplemental Table S6). Therefore, it may be useful to conduct further secondary evaluation for AlphaFold2 predictions with high ranking confidence.

      (3) Identification of pairwise interactions is only a first step towards understanding complex interactions. It is pretty clear from the predictions that some (but certainly not all) of the pairs could be used to build larger complexes. AlphaFold easily handles proteins up to 4-5000 residues, so this should be possible. I suggest that the authors do this to provide more biological insights.

      Already mentioned above.

      (4) The comparisons of ranking confidence vs ipTM/pTM are less interesting (by definition ranking confidence is virtually identical to ipTM).

      Thank you for the thoughtful comment. As the reviewer pointed out, there is not much difference between ranking confidence and ipTM shown in Fig. 1A. A high value of pTM (firmly folding) tends to increase ranking confidence, while a low value of pTM (many disorder regions) tends to decrease ranking confidence. Therefore, it may be useful to change the threshold for confidence for each protein pair.

    1. Author response:

      We thank the reviewers for the detailed evaluations and thoughtful comments, which have improved the clarity and readability of this manuscript. We have responded to all reviewer comments and incorporated their suggested changes into the text and figures. We have also included new experimental results suggested by reviewer 2, which further strengthen our main conclusion.

      Point-by-point description of the revisions

      Reviewer #1:

      (1) Introduction, page 3: The statement "Single dimeric kinesin moves processively along microtubules in a hand-over-hand manner by alternately moving the two heads in an 8-nm step toward the plus-end of the microtubule" is inaccurate. The kinesin heads take ~16 nm steps, while the center of mass advances in ~8 nm increments. Please adjust the wording accordingly.

      (2) Introduction, page 5: In the sentence "These results are consistent with the closed and open conformations of the nucleotide-binding pocket in the rear and front heads of microtubule-bound kinesin dimers observed in cryo-electron microscopy (cryo-EM) studies," I recommend changing the order to align with the previous sentence. The correct order would be "These results are consistent with the open and closed conformations of the nucleotide-binding pocket in the front and rear heads."

      We thank the reviewer for pointing out our misunderstandings. We have corrected these sentences accordingly (lines 45-47 and lines 111-112).

      Reviewer #2:

      MAJOR CONCERNS

      Limitations of this study: The authors need to discuss the limitations of their work. 1) They used a cys-lite kinesins mutant and introduced new surface-exposed cysteines. These mutants have lower kcat values than WT. 2) They used fluorescently labeled ATP molecules, which are hydrolyzed 10 times slower than unlabeled nucleotides. 3) They still observe crosslinking under reducing conditions and partial (but almost complete) crosslinking under oxidized conditions. 4)They assumed that cysteine crosslinked orientation mimics the orientation of the neck-linker in the front and rear conditions. The authors clearly pointed to these issues in the Results section. While these assumptions are also supported by several control experiments, the authors need to acknowledge some of these limitations in the Discussion as well.

      We have now reiterated some of the key caveats in the Discussion, and newly described in the Results section those points not mentioned in the original manuscript that do not affect the conclusion. We also added a summary of the limitations and caveats into the first paragraph of the Discussion section (lines 425-431).

      (1) We added a sentence in the Results section to describe that the ATP-binding kinetics of the Cys-light mutant remained consistent with previous studies as follows: “First, we demonstrated that k<sub>+1</sub> and k<sub>-1</sub> of the wild-type head without Cys-modification were unchanged after oxidization (Table 1) and were comparable to those previously reported (Cross, 2004)” (lines 163-166). The reduced kcat values of cysteine pair-added mutants before crosslinking were primarily due to reduced microtubule association rate (data not included in this manuscript). We have added a sentence in the Results section describing the kcat results as follows: “The reduced ATPase activity primarily results from a decreased microtubule association rate (data to be presented elsewhere) with little change in ATP binding or microtubule dissociation rates (Table 1).” (lines 144-146).

      (2) Fluorescently-labeled ATP was used to determine the ATP off-rates of the E236A mutant monomer and E236A rear head of the E236A/WT heterodimer. Two caveats in these measurements could lead to underestimating the ATP off-rate: 1) The off rate of Alexa-ATP from the head may be reduced compared to unmodified ATP, as Alexa-ATP driven motility showed a 10-fold reduce velocity. 2) The ATP off-rate of the E236A mutant may differ from that of the rear head in the wild-type dimer, since the E236A mutant likely stabilizes the neck linker-docked state more strongly than in the rear head of the wild-type dimer. These points are crucial for evaluating the results of ATP off-rate and the affinity for ATP, so we have added sentences in the Discussion section as follows: “We note, however, that this K<sub>d</sub> of ATP may somewhat underestimate the true value in wild-type kinesin for two reasons: first, the E236A mutation likely stabilizes the neck linker-docked, closed state more than in the rear head of the wild-type dimer (Rice et al., 1999), and second, the Alexa-ATP used to measure the ATP off-rate of E236A head showed ~10-fold smaller velocity compared to unmodified ATP, partly due to a slower ATP off-rate (Figure 2-figure supplement 3).” (lines 449-454).

      (3) Under reducing condition, the rear head crosslink contained 30% crosslinked species, while under oxidized condition, the front head crosslink contained 11% un-crosslinked species (Figure 1-figure supplement 1). These heterogeneities likely affect the rate constants of K<sub>-1</sub> for rear head crosslink and K<sub>2</sub> for front head crosslink, as crosslinked and un-crosslinked species showed significantly different rate constants. However, we did not use the rear head crosslink result to determine K<sub>-1</sub>, since ATP hydrolysis likely occurred before reversible ATP dissociation. Instead, we used E236A monomer to estimate the K<sub>-1</sub> of the rear head. In addition, the result for K<sub>2</sub> of the front head crosslink was further validated using the E236A/WT heterodimer, which will be described in the next section.

      (4) This is an important point, and therefore, we conducted experiments using the E236A/WT heterodimer (including new experimental results of ATP binding kinetics of the front head) and obtained consistent results. To address this point, we have revised the following sentences in the Discussion: “In the front head, backward orientation of the neck linker has little effect on ATP binding and dissociation rates, both when measured for a monomer crosslink (Figure 2A, B) and for the front head of a E236A-WT heterodimer (Figure 4B, C, F).” (lines 432-433); “However, we found that the ATP-induced detachment rates from microtubule (K<sub>2</sub>) were similarly reduced for both the front head crosslink (7.0 s<sup>-1</sup>; Figure 3A) and the front WT head of the E236A/WT heterodimer (6.3 s<sup>-1</sup>; Figures 6D), suggesting that a step subsequent to ATP binding is gated in the front head.” (lines 437-441).

      Line 238, the authors wrote that "forward constraint on the neck linker in the rear head does not significantly accelerate the detachment from the microtubule." Can the authors comment on why the read-head-like construct has a low affinity for microtubules even in the absence of ATP (Line 220)? I believe that the low affinity of the head in this conformation is more striking (and potentially more important) than the changes they observe in detachment rates. The authors should also consider that they might not be able to reliably measure the changes in the dissociation rate in single molecule assays of this construct (especially if the release rate of the rear head in the oxidized condition increases a lot higher than that of WT). The kymographs show infrequent and brief events, which raises doubts about how reliably they can measure the release rates under those imaging conditions. Higher motor concentrations and faster imaging rates may address this concern.

      The low microtubule affinity of the rear-head-like crosslink stems from an extremely slow ADP release rate upon microtubule binding, not from a fast microtubule-detachment rate. Using stopped-flow measurements of microtubule-binding kinetics (microtubule-stimulated mant-ADP release and microtubule association rates), we found that the rear-head-crosslink resulted in a 2,000-fold decrease in the microtubule-stimulated ADP-release rate. This finding also explains the reduced ATPase of the rear-head-crosslink (Figure 1E). Since this low microtubule-affinity state occurs in the ADP-bound state rather than the ATP-bound state, we hypothesized that the neck-linker docked ADP-bound state cannot effectively bind to microtubules, requiring neck-linker undocking for microtubule binding (Mattson-Hoss et al., Proc. Natl. Acad. Sci., 111, 7000-7005 (2014)). While we acknowledge that understanding slow microtubule binding in the neck linker docked state is important for elucidating the mechanism and regulation of microtubule-binding of the head, this paper focuses specifically on the mechanism and regulation of “microtubule-detachment”. We plan to present these microtubule-binding kinetics data in a separate manuscript currently in preparation.

      To explain the low microtubule affinity of the rear-head-crosslink, we added this explanation to the text; “because this constraint on the neck linker dramatically reduces the microtubule-activated ADP release rate (data to be presented elsewhere), creating a weak microtubule binding state” (lines 226-228).

      Although the rear head crosslinking construct under oxidative condition showed fewer fluorescent spots per kymographs (images) due to its low microtubule binding rate, we collected more than one hundred spots by recording additional microscope movies (N=140; Figure 3-figure supplement 2B), ensuring sufficient data for statistical analysis.

      Figure 2: How do the rates shown in Figure 2A-B compare to the previous kinetics studies in the field? The authors compare the dissociation rate of WT measured in rapid mixing experiments to that of E236A in smFRET assays. It is not clear whether these comparisons can be made reliably using different assays. Can the authors perform rapid mixing of E236A or try to determine the rate for the WT from smFRET trajectories?

      The results of ATP on/off rates are comparable to the previous stopped flow measurements of ATP binding to monomeric kinesin-1 on microtubule, which are 2-5 µM<sup>-1</sup>s<sup>-1</sup> and ~150 s<sup>-1</sup>, respectively (summarized in the review by Cross (2004)). We added a sentence as follows: “First, we demonstrated that K<sub>+1</sub> and K<sub>-1</sub> of the wild-type head without Cys-modification were unchanged after oxidization (Table 1) and were comparable to those previously reported (Cross, 2004).” (lines 163-166).

      As the reviewer pointed out, the rapid mixing and smFRET data cannot be directly compared due to the differences in temporal resolution and fluorescent probe used. In Figure 2E (2F in the revised version), we measured ATP dissociation rate for both WT and E236A using smFRET. Due to the lower temporal resolution, we could not accurately determine ATP binding rate using smFRET. Therefore, to compare the ATP binding rate between WT and E236A heads, we now have added stopped-flow measurements of mant-ATP binding to the E236A monomer, as shown in Fig. 2C and Figure 2-supplement 2, and described in the text (lines 182-185).

      Line 396: One of the most significant conclusions of this work is that the backward orientation of the neck linker has little effect on ATP binding to the front head. This is only supported by the results shown in Fig. 2A-B. Can the authors perform/analyze smFRET assays on the E236A/WT heterodimer to directly show whether the ATP binding rate to the WT head is affected or not affected by the orientation of the neck linker of the WT head?

      We agree with the reviewer that our finding about ATP binding to the front head is potentially significant in the kinesin field, as it has been widely believed that ATP-binding is suppressed in the front head. In our original manuscript, this conclusion was supported only by the measurement of ATP on-rate of the front-head-crosslink, which may differ from the front head of a dimer in which the backward orientation of the neck linker is maintained by the backward strain. Although the reviewer suggested performing smFRET experiments using E236A/WT heterodimer, smFRET have relatively low temporal resolution (50-100 fps) and cannot accurately measure the frequency of ATP binding, so we used this technique only to determine ATP off rates. In this revised manuscript, we now have added stopped-flow experiments to separately measure the ATP binding to the front and rear heads of the E236A/WT heterodimer. By labeling the rear E236A head with a fluorophore to quench the mant-ATP signal bound to the rear head, we successfully measured mant-ATP binding rate to the front head. We found that the ATP-binding rate to the front head was comparable to that of an unconstrained monomer head, providing direct evidence for our conclusion. The revised version includes Fig. 4 A-C (with Figure 4-supplement 2; Figs. 4 and 5 are swapped in order) showing the kinetics of ATP binding to the front and rear heads of the E236A/WT heterodimer, with corresponding text in the result section (lines 315-324).

      MINOR CONCERNS

      Lines 31 and 32: I recommend replacing "ATP affinity" with "ATP binding rate" or "the dissociation of ATP" to be more specific. This is because they do not directly measure the affinity (Kd), but instead measure the on or off rates.

      Line 41: Replace "cellar" with "cellular".

      Line 83: The authors should cite Andreasson et al. here.

      We have corrected these sentences accordingly (lines 31, 40, 85).

      Lines 83-86: It seems this sentence belongs to the next paragraph. It also needs a citation(s).

      This statement lacks experimental evidence and may confuse readers, so we have removed it for clarity.

      Line 151: It would be helpful to add a conclusion sentence at the end of this paragraph to explain what these results mean to the reader.

      A conclusion sentence of this paragraph has been added: “These results demonstrate that neck linker constraints in both forward and rearward orientations inhibit specific steps in the mechanochemical cycle of the head (lines 151-153)”.

      Lines 175-180: I recommend combining and shortening these sentences, as follows, to avoid confusing the reader: "To detect the ATP dissociation event of the rear head, we employed a mutant kinesin with a point mutation of E236A in the switch II loop, which almost abolishes ATPase hydrolysis and traps in the microtubule-bound, neck-linker docked state,"

      We have corrected these sentences accordingly (line 179-181).

      Line 314: "which was rarely observed ...". This is out of place and confusing as is. I recommend moving this sentence after the sentence that ends in Line 295.

      This sentence explains how the dark-field microscopy data was analyzed to determine whether the labeled head was in the leading or trailing position before detaching from the microtubule, but the explanation needs clarification. We removed the phrase “which was rarely observed for E236A-WT heterodimer” and simplified this sentence as follows: “Moreover, these observations allow us to distinguish whether the gold-labeled WT head was in the leading or trailing position just before microtubule detachment; the backward displacement of the detached head indicates that the labeled WT head occupied the leading position prior to detachment (Figure 5-figure supplement 1).” (lines 347-351).

      Line 300: Can the authors comment on why E236A/WT has a substantially lower ATPase rate than WT homodimer? Is it possible to determine which step in the catalytic cycle is inhibited?

      We demonstrated that the k<sub>2</sub> (microtubule-detachment rate) of the front head matched the ATP turnover rate of the E236A/WT heterodimer (Figure 6 B and E), suggesting that the inhibited step occurs after ATP binding in the front head. In contrast, the rear E236A head showed virtually no ATP hydrolysis activity, since in high-speed dark field microscopy, we observed forward step caused by rear E236A head detachment from microtubule only rarely, approximately once every few seconds (Figure 5-figure supplement 1). We added a sentence in the text as follows: “As described later, the reduced ATPase rate results from suppressed microtubule detachment of the front WT head, while the rear E236A head is virtually unable to detach from microtubules” (lines 311-313).

      Line 323: Is the unbound dwell time unchanged?

      The unbound dwell time exhibited a weak ATP-dependence, which we described only in Figure 5-supplement 2 (Figure 4-supplement 2 in the old version). We observed three distinct phases in the unbound dwell time based on mobility differences, with ATP dependence appearing only in the third phase. This finding suggests that ATP binding to the microtubule-bound E236A head is sometimes necessary for the detached WT head to rebind to the forward-tubulin binding site, indicating that the microtubule-bound E236A head occasionally releases ATP during the one-head-bound state (without the forward neck linker strain). To describe the ATP-dependence of the unbound dwell time, we added a sentence in the main text as follows: “In contrast, the dwell time of the unbound state of the gold-labeled WT head showed weak ATP dependence (Figure 5-figure supplement 2), indicating that the rear E236A head occasionally releases ATP when the front head detaches from the microtubule and the neck linker of E236A head becomes unconstrainted. This finding further supports the idea that forward neck linker strain plays a crucial role in reducing the reversible ATP release rate.” (lines 372-377).

      Line 331: I recommend replacing "ATP-induced detachment" with "nucleotide-induced detachment" for clarity.

      We have revised the phrase accordingly (line 371).

      Line 344: I recommend replacing "affinity" with "forward strain prevents the release of the nucleotide" or similar to avoid confusion. Forward strain reduces the off-rate of the bound nucleotide, rather than allowing ATP to bind more efficiently to the rear head.

      We agree to the reviewer’s comment and have corrected this sentence accordingly (line 338).

      Lines 376-385: G7-12 constructs are introduced in Figure 6, but the results in this paragraph are shown in Figure 5. They should be moved to Figure 6 to avoid confusion.

      To improve the readability, we have reorganized Figures 4-6, such that all the figure panels related to the neck linker extended mutants are shown in Figure 6; Figure 5D has been moved to Figure 6F.

      Line 421: delete "not" before "does not".

      We have corrected this typo.

      Lines 433-441: Unless I am mistaken, more recent work in the kinesin field showed that backward trajectories of kinesin 1 reported by Carter and Cross are due to slips from the microtubule rather than backward processive runs of the motor.

      The slip motion demonstrated by Sudhakar et al. (2021) differs from the backstep motion reported by Carter and Cross (and many other laboratories). Slip motion occurs after kinesin detaches from the microtubule and continues until the bead returns to the trap center. In contrast, backstep motion occurs during processive movement when the trap force either exceeds or approaches the stall force. The kinetics of these motions also differ significantly: slip steps occur with a dwell time of 71 µs and are independent of ATP concentration, while backsteps take ~0.3 s (at 1 mM ATP) and depend on ATP concentration. These differences indicate that slip motion is phenomenologically distinct from backsteps occurring under supra-stall or near-stall force.

      Line 474: Replace "suppresses" with "suppressed".

      We have corrected this typo.

      Figure 4E: I would plot these results with increasing ATP concentration on the x-axis.

      We formatted Figure 4E to match Figure 4b from Isojima et al. (Nature Chem. Biol. 2015), to emphasize the difference in ATP dependence of the front and rear head.

      Figure 4B: The authors should explain how they distinguish between bound and unbound states in the main text or figure legends. For example, it is not clear how the authors score when the motor rebinds to the microtubule in the first unbinding event shown in Figure 4B (displacement plot).

      The method was described in the Materials and Methods section, but we have now described how to distinguish between bound and unbound states in the main text as follows: “Unlike the unbound trailing head of wild-type dimer that showed continuous mobility (Isojima et al., 2016), the unbound WT head of E236A-WT heterodimer exhibited a low-fluctuation state in the middle (Figure 5B, s.d. trace). This low-fluctuation unbound state was distinguishable from the typical microtubule-bound state, having a shorter dwell time of ~5 ms compared to the bound state and positioning backward, closer to the E236A head, relative to the bound state (Figure 5-figure supplement 2).” (lines 351-356).

      Reviewer #3:

      Minor Issues:

      - Line 22, Abstract - The phrase "move in a hand-over-hand manner" could be clearer if phrased as "move in a hand-over-hand fashion" to improve readability.

      We changed the word “manner” to “process” (line 23).

      - Abstract - Neck linker conformation in the leading head: The sentence "We demonstrate that the neck linker conformation in the leading kinesin head increases microtubule affinity without altering ATP affinity" would benefit from defining this conformation as "backward" for clarity.

      - Abstract - Neck linker conformation in the trailing head: The sentence "The neck linker conformation in the trailing kinesin head increases ATP affinity by several thousand-fold compared to the leading head, with minimal impact on microtubule affinity" should also clarify that this conformation is "forward."

      We have corrected these sentences accordingly (line 30, 32).

      - Abstract - Conformation-specific effects: The authors mention conformation-specific effects in the neck linker structure but do not define the neck linker's conformation or the motor domain's (MD) conformation. Clarifying these conformational changes would improve the explanation of how they promote ATP hydrolysis and dissociation of the trailing head before the leading head detaches from the microtubule, thereby providing a kinetic basis for kinesin's coordinated walking mechanism.

      We have revised the last sentence of the abstract accordingly by specifying the neck linker’s conformation as follows: “In combination, these conformation-specific effects of the neck linker favor ATP hydrolysis and dissociation of the rear head prior to microtubule detachment of the front head, thereby providing a kinetic explanation for the coordinated walking mechanism of dimeric kinesin.” (lines 34-37).

      - Line 306 - Use of ATP in the E236A-WT heterodimer: In discussing the "ATP-induced detachment rate of the WT head in the E236A-WT heterodimer," the authors should consider justifying their choice of ATP over ADP for inducing microtubule (MT) dissociation. Since ATP typically promotes tighter MT binding and ATP turnover is reduced in forward-positioned WT heads, it may be unclear to some readers why ATP was chosen.

      We measured the ATP-induced detachment rate k<sub>2</sub> of the front head of the E236A-WT heterodimer to validate our findings from the front-head-crosslinked monomer experiments, which demonstrated reduced k<sub>2</sub> after oxidation. To clarify this point, we have now included ATP binding kinetics measurements for both front and rear heads of the E236A-WT heterodimer, as suggested by reviewer 2. These additional data demonstrate consistency between the results from the crosslinked monomer and E236A-WT heterodimer experiments.

      - Discussion - Backward-oriented neck linker in the front head: The discussion mentions that the backward-oriented neck linker in the front head reduces its ATP-induced detachment rate, suggesting that a step after ATP binding (e.g., isomerization, ATP hydrolysis, or phosphate release) is gated in the front head. However, the authors do not clarify that the backward neck linker orientation would imply the nucleotide pocket should be open or at least not fully closed, thus inhibiting ATP turnover. This is important because, as demonstrated in other studies, full closure of the nucleotide pocket is linked to neck linker docking. This point should be addressed earlier in the discussion.

      We have addressed this point by revising this sentence as follows: “These results are consistent with an inability of the front head to fully close its nucleotide pocket to promote ATP hydrolysis and Pi release (Benoit et al., 2023), as will be discussed later.” (lines 441-443)

    1. Author response:

      We thank the reviewers for their thorough review of our manuscript and their constructive feedback. We will address their comments and concerns in a point-by-point response at a later stage but would like to clarify some minor misunderstanding to not confuse any readers in the meantime.

      - In regard to population ablation: When investigating the contribution of population size to reconstruction quality, we used 12.5, 25, 50 or 100% of the recorded neuronal population, which corresponds to ~1000/2000/4000/8000 neurons per animal. We did not produce reconstructions from only 1 neuron.

      - In regard to the training of the transparency masks: The transparency masks were not produced using the same movies we reconstructed. We apologize for the lack of clarity on this point in the manuscript. We calculated the masks using an original model instance rather than a retrained instances used in the rest of the paper. Specifically, the masks were calculated using the original model instance ‘fold 1’ and data fold 1, which is it’s validation fold. In contrast, the model instances used in the paper for movie reconstruction were retrained while omitting the same validation fold across all instances (fold 0) and all the reconstructed movies in the paper are from data fold 0.

      - In regard to reconstruction based on predicted activity: We always reconstructed the videos based on the true neural responses not the predicted neural response, with the exception of the Gaussian noise and drifting grating stimuli in Figure 4 and Supplementary Figure S2 where no recorded neural activity was available).

    1. Author response:

      We thank both reviewers for their suggestions on improving our manuscript, which is focused on demonstrating that the C3a-C3aR axis modulates trained immune responses in alveolar macrophages. The Short Report format precludes separating the Results and Discussion sections. However, we will work towards a clearer presentation of findings and providing a more comprehensive interpretation of the data in the Revision, by addressing the points brought up by both Reviewers.

      We agree with the suggestions from Reviewer 1 that (1) other cell types such as dendritic cells, neutrophils, and endothelial cells can also be involved in immune training, and (2) macrophages have other activities beyond releasing inflammatory cytokines, and will clarify both these points in the Revision. The mechanism of C3 being cleaved intracellularly and binding to lysosomal C3aR involves cathepsin-dependent cleavage of C3 to C3a and has been experimentally proven (Liszewski et al. Immunity 2013). However, we will clarify this mechanism in the revision. We also acknowledge that the observations need to be validated in human-based models. Currently, we do not have access to an adequate representation of human alveolar macrophages for our ex vivo testing to account for individual-level variation in immune responses. However, we anticipate this work will form the basis of these future studies.

      We also appreciate Reviewer 2’s suggestions regarding demonstrating the resolution of acute inflammation after the initial exposure to heat-killed Pseudomonas. We will address this critique by performing additional experiments, which will be included in the Revision. We also agree that the responses of trained C3-deficient cells should be compared to untrained C3-deficient controls after the LPS challenge. We will include this data in the Revision, in addition to the requested data for Figures 3 and 4. We would like to clarify that we do not observe baseline differences between untrained C3-sufficient (wildtype) and C3-deficient alveolar macrophages, even in their glycolytic capacity, and thus, anticipate that our revised data will strengthen the conclusions from the original manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to characterize neurocomputational signals underlying interpersonal guilt and responsibility. Across two studies, one behavioral and one fMRI, participants made risky economic decisions for themselves or for themselves and a partner; they also experienced a condition in which the partners made decisions for themselves and the participant. The authors also assessed momentary happiness intermittently between choices in the task. Briefly, results demonstrated that participants' self-reported happiness decreased after disadvantageous outcomes for themselves and when both they and their partner were affected; this effect was exacerbated when participants were responsible for their partner's low outcome, rather than the opposite, reflecting experienced guilt. Consistent with previous work, BOLD signals in the insula correlated with experienced guilt, and insula-right IFG connectivity was enhanced when participants made risky choices for themselves and safe choices for themselves and a partner.

      Strengths:

      This study implements an interesting approach to investigating guilt and responsibility; the paradigm in particular is well-suited to approach this question, offering participants the chance to make risky v. safe choices that affect both themselves and others. I appreciate the assessment of happiness as a metric for assessing guilt across the different task/outcome conditions, as well as the implementation of both computational models and fMRI.

      We thank Reviewer 1 for their positive assessment of our manuscript.

      Weaknesses:

      In spite of the overall strengths of the study, I think there are a few areas in which the paper fell a bit short and could be improved.

      We are looking forward to improving our manuscript based on the Reviewers’ comments. According to eLife’s policy, here are our provisional replies as well as plans for changes.

      (1) While the framing and goal of this study was to investigate guilt and felt responsibility, the task implemented - a risky choice task with social conditions - has been conducted in similar ways in past research that were not addressed here. The novelty of this study would appear to be the additional happiness assessments, but it would be helpful to consider the changes noted in risk-taking behavior in the context of additional studies that have investigated changes in risky economic choice in social contexts (e.g., Arioli et al., 2023 Cerebral Cortex; Fareri et al., 2022 Scientific Reports).

      We certainly agree that several previously published studies have relied on risky choice tasks with social conditions. We will happily refer to the studies mentioned when discussing changes in risk-taking behaviour in our revised manuscript.

      (2) The authors note they assessed changes in risk preferences between social and solo conditions in two ways - by calculating a 'risk premium' and then by estimating rho from an expected utility model. I am curious why the authors took both approaches (this did not seem clearly justified, though I apologize if I missed it). Relatedly, in the expected utility approach, the authors report that since 'the number of these types of trials varied across participants', they 'only obtained reliable estimates for [gain and loss] trials in some participants' - in study 1, 22 participants had unreliable estimates and in study 2, 28 participants had unreliable estimates. Because of this, and because the task itself only had 20 gains, 20 losses, and 20 mixed gambles per condition, I wonder if the authors can comment on how interpretable these findings are in the Discussion. Other work investigating loss aversion has implemented larger numbers of trials to mitigate the potential for unreliable estimates (e.g., Sokol-Hessner et al., 2009).

      We agree that we have not clearly justified why we have taken two approaches to assess risk preferences. In short, both approaches have advantages and inconveniences when applied to our experiment. We will happily detail our reasons in the revised manuscript. Regarding the second point of this comment: the small number of reliable estimates is one of the reasons that we have used another approach to assess risk preferences. We would certainly have obtained more reliable estimates if we had implemented more trials. We will discuss the interpretability of all the risk preference estimates we used in the revised Discussion.

      (3) One thing seemingly not addressed in the Discussion is the fact that the behavioral effect did not replicate significantly in study 2.

      We agree that we could have discussed more the fact that there were (slight but significant) differences in risk preferences between the Solo and Social conditions in Study 1 but not in Study 2. While the absence of a significant difference in Study 2 is helpful to compare the neural mechanisms involved in making decisions for oneself vs. for oneself and another person (because any differences could not be explained by differences in risk preferences), we certainly should expand our discussion of the differences in findings between the two studies, which we will do in the revised manuscript.

      (4) Regarding the computational models, the authors suggest that the Reponsibility and Responsibility Redux models provided the best fit, but they are claiming this based on separate metrics (e.g., in study 1, the redux model had the lowest AIC, but the responsibility only model had the highest R^2; additionally, the basic model had the lowest BIC). I am wondering if the authors considered conducting a direct model comparison to statistically compare model fits.

      We agree that we should run formal, direct model comparison tests using for example chi-square or log-likelihood-ratio tests. We will do so in the revised manuscript.

      (5) In the reporting of imaging results, the authors report in a univariate analysis that a small cluster in the left anterior insula showed a stronger response to low outcomes for the partner as a result of participant choice rather than from partner choice. It then seems as though the authors performed small volume correction on this cluster to see whether it survived. If that is accurate, then I would suggest that this result be removed because it is not recommended to perform SVC where the volume is defined based on a result from the same whole-brain analysis (i.e., it should be done a priori).

      As indicated in the manuscript, the small insula cluster centered at [-28 24 -4] and shown in Figure 4F survived corrections for multiple tests within the anatomically-defined anterior insula (based on the anatomical maximum probability map described in Faillenot et al., 2017), which is independent of the result of our analysis. We agree that one should not (and we did not) perform multiple corrections based on the results one is correcting – that would indeed be circular and misleading “double-dipping”. The anterior insula is one of the regions most frequently associated with guilt (see the explanations in our Introduction, which refers for example to Bastin et al., 2016; Lamm & Singer, 2010; Piretti et al., 2023). Thus we feel that performing small-volume correction within the anatomically-defined anterior insula is an acceptable approach to correct for multiple tests in this case. We fully acknowledge that, independently of any correction, the effect and the cluster are small. We will clarify these explanations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary

      This manuscript focuses on the role of social responsibility and guilt in social decision-making by integrating neuroimaging and computational modeling methods. Across two studies, participants completed a lottery task in which they made decisions for themselves or for a social partner. By measuring momentary happiness throughout the task, the authors show that being responsible for a partner's bad lottery outcome leads to decreased happiness compared to trials in which the participant was not responsible for their partner's bad outcome. At the neural level, this guilt effect was reflected in increased neural activity in the anterior insula, and altered functional connectivity between the insula and the inferior frontal gyrus. Using computational modeling, the authors show that trial-by-trial fluctuations in happiness were successfully captured by a model including participant and partner rewards and prediction errors (a 'responsibility' model), and model-based neuroimaging analyses suggested that prediction errors for the partner were tracked by the superior temporal sulcus. Taken together, these findings suggest that responsibility and interpersonal guilt influence social decision-making.

      Strengths

      This manuscript investigates the concept of guilt in social decision-making through both statistical and computational modeling. It integrates behavioral and neural data, providing a more comprehensive understanding of the psychological mechanisms. For the behavioral results, data from two different studies is included, and although minor differences are found between the two studies, the main findings remain consistent. The authors share all their code and materials, leading to transparency and reproducibility of their methods.

      The manuscript is well-grounded in prior work. The task design is inspired by a large body of previous work on social decision-making and includes the necessary conditions to support their claims (i.e., Solo, Social, and Partner conditions). The computational models used in this study are inspired by previous work and build on well-established economic theories of decision-making. The research question and hypotheses clearly extend previous findings, and the more traditional univariate results align with prior work.

      The authors conducted extensive analyses, as supported by the inclusion of different linear models and computational models described in the supplemental materials. Psychological concepts like risk preferences are defined and tested in different ways, and different types of analyses (e.g., univariate and multivariate neuroimaging analyses) are used to try to answer the research questions. The inclusion and comparison of different computational models provide compelling support for the claim that partner prediction errors indeed influence task behavior, as illustrated by the multiple model comparison metrics and the good model recovery.

      We thank Reviewer 2 very much for their comprehensive description of our study and the positive assessment of our study and approach.

      Weaknesses

      As the authors already note, they did not directly ask participants to report their feelings of guilt. The decrease in happiness reported after a bad choice for a partner might thus be something else than guilt, for example, empathy or feelings of failure (not necessarily related to guilt towards the other person). Although the patterns of neural activity evoked during the task match with previously found patterns of guilt, there is no direct measure of guilt included in the task. This warrants caution in the interpretation of these findings as guilt per se.

      We fully agree that not directly asking participants about feelings of guilt is a clear limitation of our study. While we already mention this in our Discussion, we will happily expand our discussion of the consequences on interpretation of our results along the lines described by the reviewer in the revised manuscript. We would like to thank Reviewer 2 for proposing these lines of thought.

      As most comparisons contrast the social condition (making the decision for your partner) against either the partner condition (watching your partner make their decision) or the solo condition (making your own decision), an open question remains of how agency influences momentary happiness, independent of potential guilt. Other open questions relate to individual differences in interpersonal guilt, and how those might influence behavior.

      We fully agree that the way agency influences happiness has not been much discussed in our manuscript so far, and we would happily do so in the revised manuscript. The same goes for individual differences in interpersonal guilt which we have not investigated due to our relatively small sample sizes but would certainly be worth investigation in subsequent work.

      This manuscript is an impressive combination of multiple approaches, but how these different approaches relate to each other and how they can aid in answering slightly different questions is not very clearly described. The authors could improve this by more clearly describing the different methods and their added value in the introduction, and/or by including a paragraph on implications, open questions, and future work in the discussion.

      We again thank the reviewer for their praise of our approach and fully agree that we can improve the description of the benefit of combining methods in the Introduction, which we will do in the revised manuscript. We will also include a paragraph on implications, open questions, and future work in the Discussion of the revised manuscript.

      However, taken together, this study provides useful insights into the neural and behavioral mechanisms of responsibility and guilt in social decision-making, and how they influence behavior.

      We again thank Reviewer 2 for their attentive reading and thoughtful comments and look forward to submitting our revised and improved manuscript.

    1. Author response:

      Reviewer 1:

      (1) We appreciate the reviewer’s suggestion to test a multi-attribute attentional drift-diffusion model (maaDDM) that does not constrain the taste and health weights to the range of 0 and 1 and will test such a model.

      (2) Similarly, we will follow the reviewer’s suggestion to address potential demand effects. First, we will add “order” (binary: hungry-sated or sated hungry) as a predictor to our GLMM, to test for potential systematic effects of order on choices and response times. Second, we will split the participants by “order” and examine whether we see group differences of tasty and healthy decisions within the first testing session. Note that we already anticipate that looking at only 50% of the data and testing for a between-subject rather than within-subject effect is likely to reduce effect size and statistical sensitivity.

      (3) We thank the reviewer for their observant remark about faster tasty choices and potential markers in the drift rate. While our starting point models show that there might be a small starting point bias towards the taste boundary which result in faster decisions, we will take a closer look at the simulated value differences as obtained in our posterior predictive checks to see if the drift rate is systematically more extreme for tasty choices.

      (4) Regarding the mtDDM, we will verify that the relative starting time (rst) effects are minuscule. While we will follow the recommendation of correlating first fixations with rst, we would like to point out that a majority of fixations (see Figure 3b) and first fixations (see Figure S6b) are on food images. We will also provide a parameter recovery of the mtDDM.

      Reviewer 2:

      (1) We would like to verify the reviewer’s interpretation that hungry people in negative calorie balance simply prefer more calories and would like to point to our supplementary analyses, in which we show that hunger state also increases the probability of higher wanted and higher caloric decisions (see SOM4, SOM5, Figure S4). Moreover, we agree that high caloric items might not be unhealthy and are happy to demonstrate the correlations between health ratings and objective caloric content, to demonstrate the strong negative correlation in our dataset, which our principal component analyses hints at, too.

      Reviewer 3:

      (1) We agree that choosing tasty over healthy options under hunger may be evolutionarily adaptive. We will address the adaptiveness of this hunger driven mechanism in our discussion, reiterating the differentiation made in the introduction that this system no longer be adaptive in our obesogenic environment, leading to suboptimal decisions.

      (2) We will address alternative explanations of the observed effects in our discussion with respect to the macro-nutritional content of the Shake and potential placebo effects arising from the shake vs no shake manipulation.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work shows that a specific adenosine deaminase protein in Dictyostelium generates the ammonia that is required for tip formation during Dictyostelium development. Cells with an insertion in the ADGF gene aggregate but do not form tips. A remarkable result, shown in several different ways, is that the ADGF mutant can be rescued by exposing the mutant to ammonia gas. The authors also describe other phenotypes of the ADGF mutant such as increased mound size, altered cAMP signalling, and abnormal cell type differentiation. It appears that the ADGF mutant has defects in the expression of a large number of genes, resulting in not only the tip defect but also the mound size, cAMP signalling, and differentiation phenotypes.

      Strengths:

      The data and statistics are excellent.

      Weaknesses:

      (1) The key weakness is understanding why the cells bother to use a diffusible gas like ammonia as a signal to form a tip and continue development.

      Diffusion of a gas can affect the signalling process of the entire colony of cells and will be quicker than other signaling mechanisms. A number of findings suggest that ammonia acts as both a local and long-range regulatory signal, integrating environmental and cellular cues to coordinate multicellular development. Ammonia serves as a crucial signalling molecule, influencing both multicellular organization and differentiation in Dictyostelium (Francis, 1964; Bonner et al., 1989; Bradbury and Gross, 1989). By raising the pH of the intracellular acidic vesicles of prestalk cells (Poole and Ohkuma, 1981; Gross et al, 1983), and the cytoplasm, ammonia is known to increase the speed of chemotaxing amoebae (Siegert and Weijer 1989; Van Duijn and Inouye, 1991), triggering multicellular movement (Bonner et al., 1988, 1989) to favor tipped mound development. The slug tip is known to release ammonia while the slime sheath at the back of the slug prevents diffusion thus maintaining high ammonia levels to (Bonner et al., 1989) promote pre-spore differentiation (Newell et al., 1969). Ammonia has been found to favor slug migration rather than fruiting (Schindler and Sussman, 1977) and thus, tip-derived ammonia may stimulate synchronized development of the entire colony. The tip exerts negative chemotaxis towards ammonia, potentially directing the slugs away from each other to ensure equal spacing of fruiting bodies (Feit and Sollitto, 1987).  

      Ammonia released in pulses acts as a long-distance signalling molecule between colonies of yeast cells indicating depletion of nutrient resources and promoting synchronous development (Palkova et al., 1997; Palkova and Forstova, 2000). A similar mechanism may be at play to influence neighbouring Dictyostelium colonies. Furthermore, ammonia produced in millimolar concentrations (Schindler and Sussman, 1977) may also ward off predators in soil as observed in Streptomyces symbionts of leaf-cutting ants to inhibit fungal pathogens (Dhodary and Spiteller, 2021). Additionally, ammonia may be recycled into amino acids, within starving Dictyostelium cells to supporting survival and differentiation as observed in breast cancer cells (Spinelli et al., 2017). Therefore, using a diffusible gas like ammonia as a signalling molecule is likely to have bioenergetic advantages. Ammonia is a natural metabolic byproduct of amino acid catabolism and other cellular processes, making it readily available without requiring additional energy for synthesis. Instead of producing a dedicated signalling molecule, cells can exploit an existing by-product for developmental regulation.

      (2) The rescue of the mutant by adding ammonia gas to the entire culture indicates that ammonia conveys no positional information within the mound.

      Ammonia is known to influence rapid patterning of Dictyostelium cells confined in a restricted environment (Sawai et al., 2002). Both neutral red staining (a marker for prestalk and ALCs) (Fig. S2) and the prestalk marker ecmA/ ecmB expression (Fig. 8C) in the adgf mutants suggest that the mounds have differentiated prestalk cells but are blocked in development. The mound arrest phenotype can be reversed by exposing the adgf mutant mounds to ammonia.  

      Based on cell cycle phases, there exists a dichotomy of cell types, that biases cell fate to prestalk or prespore (Weeks and Weijer, 1994; Jang and Gomer, 2011). Prestalk cells are enriched in acidic vesicles, and ammonia, by raising the pH of these vesicles and the cytoplasm (Davies et al 1993; Van Duijn and Inouye 1991), plays an active role in collective cell movement (Bonner et al., 1989). Thus, ammonia reinforces or maintains the positional information by elevating cAMP levels, favouring prespore differentiation (Bradbury and Gross, 1989; Riley and Barclay, 1990; Hopper et al., 1993). 

      (3) By the time the cells have formed a mound, the cells have been starving for several hours, and desperately need to form a fruiting body to disperse some of themselves as spores, and thus need to form a tip no matter what.

      When the adgf mutants were exposed to ammonia just after tight mound formation, tips developed within 4 h (Fig. 6). In contrast, adgf mounds not exposed to ammonia remained at the mound stage for at least 30 h. This demonstrates that starvation alone is not sufficient to drive tip development and ammonia serves as a cue that promotes the transition from mound to tipped mound formation. 

      Many mound arrest mutants are blocked in development and do not proceed to form fruiting bodies (Carrin et al., 1994). Furthermore, not all the mound arrest mutants tested in this study were rescued by ADA enzyme (Fig. S3 A), and they continue to stay as mounds without dispersing as spores, suggesting that mound arrest in Dictyostelium can result from multiple underlying defects, whereas ammonia is an important factor controlling transition from mound to tip formation.

      (4) One can envision that the local ammonia concentration is possibly informing the mound that some minimal number of cells are present (assuming that the ammonia concentration is proportional to the number of cells), but probably even a minuscule fruiting body would be preferable to the cells compared to a mound. This latter idea could be easily explored by examining the fate of the ADGF cells in the mound - do they all form spores? Do some form spores?

      Or perhaps the ADGF is secreted by only one cell type, and the resulting ammonia tells the mound that for some reason that cell type is not present in the mound, allowing some of the cells to transdifferentiate into the needed cell type. Thus, elucidating if all or some cells produce ADGF would greatly strengthen this puzzling story.

      A fraction of adgf mounds form bulkier spore heads by the end of 36 h as shown in Fig. 3. This late recovery may be due to the expression of other ADA isoforms. Mixing WT and adgf mutant cell lines results in a slug with the mutants occupying the prestalk region (Fig. 9) suggesting that WT ADGF favours prespore differentiation. However, it is not clear if ADGF is secreted by a particular cell type, as adenosine can be produced by both cell types, and the activity of three other intracellular ADAs may vary between the cell types. To address whether adgf expression is cell type-specific, we will isolate prestalk and prespore cells, and thereafter examine adgf expression in each population.

      ADGF activity is likely to be higher in the tip to remove excess adenosine, the tip-inhibiting molecule (Wang and Schaap, 1985). Moreover, our results show that adgf<sup>-</sup> cells with high adenosine preferentially migrate to the prestalk rather than the prespore region when mixed with WT cells. Ammonia generated from adenosine deamination could thus drive tip development and prespore differentiation.

      Reviewer #2 (Public review):

      Summary:

      The paper describes new insights into the role of adenosine deaminase-related growth factor (ADGF), an enzyme that catalyses the breakdown of adenosine into ammonia and inosine, in tip formation during Dictyostelium development. The ADGF null mutant has a pre-tip mound arrest phenotype, which can be rescued by the external addition of ammonia. Analysis suggests that the phenotype involves changes in cAMP signalling possibly involving a histidine kinase dhkD, but details remain to be resolved.

      Strengths:

      The generation of an ADGF mutant showed a strong mound arrest phenotype and successful rescue by external ammonia. Characterization of significant changes in cAMP signalling components, suggesting low cAMP signalling in the mutant and identification of the histidine kinase dhkD as a possible component of the transduction pathway. Identification of a change in cell type differentiation towards prestalk fate

      Weaknesses:

      (1) Lack of details on the developmental time course of ADGF activity and cell type type-specific differences in ADGF expression.

      ADGF expression was examined at 0, 8, 12, and 16 h (Fig. 1), and the total ADA activity was assayed at 12 and 16 h (Fig. 4). As per the reviewer’s suggestion, we have now included the 12 h data (Fig. 4A) to provide additional insights into the kinetics of ADGF activity. The adgf expression was found to be highest at 16 h and hence, the ADA assay was carried out at that time point. However, the ADA assay will not exclusively reflect ADGF activity since it reports the activity of the three other isoforms as well.

      A fraction of adgf<sup>-</sup> mounds form bulkier spore heads by the end of 36 h as shown in Fig. 3. This late recovery may be due to the expression of the other ADA isoforms. Mixing WT and adgf mutant cell lines results in a slug with the mutants occupying the prestalk region (Fig. 9), suggesting that WT adgf favours prespore differentiation.

      However, it’s not clear if ADGF is secreted by a particular cell type, as adenosine can be produced by both cell types, and the activity of the other three intracellular ADAs may vary between the cell types. To address whether adgf expression is cell typespecific, we will isolate prestalk and prespore cells, and thereafter examine adgf expression in each population.

      ADGF activity is likely to be higher in the tip to remove excess adenosine, the tipinhibiting molecule (Wang and Schaap, 1985). Moreover, our results show that adgf<sup>-</sup> cells with high adenosine preferentially migrate to the prestalk rather than the prespore region when mixed with WT cells.

      (2) The absence of measurements to show that ammonia addition to the null mutant can rescue the proposed defects in cAMP signalling.

      The cAMP levels were measured at two time points 8 h and 12 h in the mutant. The adgf mutant has lower ammonia levels (Fig. 6), diminished acaA expression (Fig. 7) and reduced cAMP levels (Fig. 7) in comparison to WT at both 12 and 16 h of development. Since ammonia is known to increase cAMP levels (Riley and Barclay, 1990; Feit et al., 2001), addition of ammonia addition to the mutant is likely to increase acaA expression, thereby rescuing the defects in cAMP signalling.

      (3) No direct measurements in the dhkD mutant to show that it acts upstream of adgf in the control of changes in cAMP signalling and tip formation.

      The histidine kinases dhkD and dhkC are reported to modulate phosphodiesterase RegA activity, thereby maintaining cAMP levels (Singleton et al., 1998; Singleton and Xiong, 2013). By activating RegA, dhkD ensures proper cAMP distribution within the mound, which is essential for the patterning of prestalk and prespore cells, as well as for tip formation (Singleton and Xiong, 2013). Therefore, ammonia exposure to dhkD mutants is likely to regulate cAMP signalling and thereby tip formation. We will address this issue by measuring cAMP levels in the dhkD mutant.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Overview:

      We appreciate all the constructive comments from the reviewer and the reviewing editor, as their suggestions have significantly improved our manuscript. In response to their comments, we have made several key revisions: First, we have performed new colocalization analyses between the active zone marker UNC-10::GFP and all UNC-13L variants (UNC13L, UNC-13L<sup>HK</sup>, UNC-13L<sup>D1-5N</sup>, and UNC-13L<sup>HK+D1-5N</sup>, all tagged with mApple). These results confirm that the mutations do not affect synaptic localization. Second, we have provided a clearer explanation of the “gain-of-function” term used in this study, emphasizing that it reflects an increased SV release due to C1-C2B module dysfunction rather than a single mechanistic state. Third, we have expanded the discussion on the physiological implications of the C1-C2B model, particularly its role in regulating synaptic transmission under varying neuronal activity conditions. Finally, to improve clarity and focus, we have removed unnecessary speculative discussions, ensuring that the revised manuscript centers on the most relevant findings.

      We have reorganized the manuscript to incorporate these new results into the figures and text. Full responses to all reviewer comments are provided below. We hope that the reviewer and the editor find these revisions satisfactory and that our manuscript is now suitable for publication in eLife.

      Joint Public Review:

      Summary:

      In this manuscript, the authors investigate how different domains of the presynaptic protein UNC-13 regulate synaptic vesicle release in the nematode C. elegans. By generating numerous point mutations and domain deletions, they propose that two membrane-binding domains (C1 and C2B) can exhibit "mutual inhibition," enabling either domain to enhance or restrain transmission depending on its conformation. The authors also explore additional Nterminal regions, suggesting that these domains may modulate both miniature and evoked synaptic responses. From their electrophysiological data, they present a "functional switch" model in which UNC-13 potentially toggles between a basal state and a gain-of-function state, though the physiological basis for this switch remains partly speculative.

      Strengths:

      (1) The authors conduct a thorough exploration of how mutations in the C1, C2B, and other regulatory domains affect synaptic transmission. This includes single, double, and triple mutations, as well as domain truncations, yielding a large, informative dataset.

      (2) The study includes systematically measuring both spontaneous and evoked synaptic currents at neuromuscular junctions, under various experimental conditions (e.g., different Ca²⁺ levels), which strengthens the reliability of their functional conclusions.

      (3) Findings that different domain disruptions produce distinct effects on mEPSCs, mIPSCs, and evoked EPSCs suggest UNC-13 may adopt an elevated functional state to regulate synaptic transmission.

      Weaknesses:

      It remains unclear whether the various domain alterations truly converge on a single "gain-offunction" state or instead represent multiple pathways for enhancing UNC-13 activity. Different mutations selectively affect spontaneous or evoked release, suggesting that each variant may not share the same underlying mechanism. Moreover, many conclusions rely on combining domain deletions or point mutations, yet the electrophysiological data show distinct outcomes across EPSCs, IPSCs, mini, and evoked responses. This raises questions about whether these manipulations all act on the same pathway and whether their observed additivity or suppression genuinely reflects a single mechanistic process. A unifying model-or at least a clearer explanation of why the authors infer one mechanistic state across different domain manipulations would strengthen the paper's conclusions.

      We appreciate the comment and understand the potential confusion regarding the use of the term "gain-of-function" in the manuscript. To clarify, the gain-of-function state described in this study does not refer to a single specific mechanistic change in UNC-13 but rather to a high synaptic vesicle (SV) release state achieved by disrupting the C1-C2B module - either through dysfunction of the C1 domain or the C2B domain (as seen with the HK and DN mutations).

      Our findings support a "seesaw" model in which the C1 and C2B domains maintain a dynamic balance in their interaction with the plasma membrane, binding to DAG and PIP2. This balance may increase the energy barrier for SV release, preventing excessive neurotransmitter release under basal conditions. However, the C1-C2B toggle may be disrupted by high neuronal activity and act in an unbalanced state, thereby enhancing synaptic transmission (i.e., the gain-of-function state). To address these concerns, we have provided a clearer explanation of this functional switch in the revised version of the manuscript (page 27).

      Regarding the differences between spontaneous and evoked neurotransmitter release, our previous studies have revealed that these two forms of release do not always respond similarly to various unc-13 mutations. This is a common phenomenon observed in other synaptic protein mutants, including synaptotagmin, tomosyn, and complexin, which indicates distinct yet partially overlapping regulatory mechanisms. Our model is well supported by most of the electrophysiological results from HK, DN, and HK+DN mutations across different unc-13 isoforms (UNC-13L, UNC-13S, UNC-13R, UNC-13ΔC2A, UNC-13ΔX). The main exception is that in UNC-13ΔX<sup>HK+DN</sup> mutants, the changes in mEPSCs and mIPSCs differ from those observed in evoked EPSCs. This suggests that the mechanisms regulating the functional switch of unc-13 may differ slightly between spontaneous and evoked release. Since the X region of unc-13 and Munc13 remains largely uncharacterized, our findings provide intriguing insights into its potential functional role.

      The manuscript proposes that UNC-13 toggles from a basal to a "gain-of-function" state under normal synaptic activity. However, it does not address when or how this switch might occur in vivo, since it is demonstrated principally via artificial mutations. Providing direct evidence or additional discussion of such switching under physiological conditions would be particularly informative.

      What is the physiological significance of the proposed gain-of-function state? The data suggest that certain mutants (e.g., HK+D1-5N) lacking the gain-of-function state can still support synaptic transmission at wild-type levels. How do the authors reconcile this with the idea that the gain-of-function state plays a critical role at the synapse?

      We appreciate these comments. While our model is mainly based on the dysfunction of the C1-C2B module (through HK and DN mutations), it provides a potential physiological framework for understanding how the structural balance of C1-C2B relates to the variability of synaptic transmission in the nervous system. In the CNS, synaptic transmission is highly variable, and the temporal pattern of the presynaptic activity may require dynamic switching of the fusion machinery, including UNC-13, between different functional modes, thereby triggering synaptic transmission at various levels. Our model suggests that under conditions of high neuronal activity, the C1-C2B module may transition from a balanced to an unbalanced state (gain-of-function state), thereby enhancing synaptic transmission.

      Regarding the physiological significance of the gain-of-function state, we acknowledge that certain mutants (e.g., HK+D1-5N) lacking this state can still support wild-type levels of synaptic transmission. This observation suggests that the gain-of-function state may not be strictly required for baseline synaptic function but rather plays a modulatory role under specific conditions, such as heightened neuronal activity or synaptic plasticity. Further investigations will be needed to determine the precise in vivo triggers and functional consequences of this switch under physiological conditions. Moreover, we will focus on several linker regions (between C1 and C2B, C2B and MUN) to investigate their potential roles in regulating synaptic transmission and their broader functional significance in UNC-13 dynamics.

      The authors determined the fluorescence intensity of mApple-tagged UNC-13 variants (Figure 1J-K and Figure 7J-K), finding no significant changes compared to the wild-type. However, a more detailed analysis of the density or distribution of fluorescent puncta in axons could clarify whether certain mutations alter the localization of UNC-13 at synapses. Demonstrating colocalization with wild-type UNC-13 (or another presynaptic marker) would help rule out mislocalization effects.

      We appreciate the comment. In response, we have included a more detailed analysis of the synaptic localization of both wild-type and mutated UNC-13L in the revised manuscript. Our data show that in all scenarios, UNC-13 proteins exhibit strong colocalization with the active zone marker UNC-10::GFP (Figure 1L). Along with the fluorescence intensity data in Figure 1J, our findings indicate that the C1 and C2B mutations do not affect the expression level or the localization of UNC-13 at synapses. These results have been incorporated into the revised manuscript (page 8) and in Figure 1L.

      The study mainly relies on extrachromosomal transgenes, which can show variable copy numbers and expression levels among individual worm strains. This variability might complicate interpretation, as differences in expression could mask or exaggerate certain phenotypes.

      We agree that the expression levels of synaptic proteins can influence synaptic transmission levels. However, given the large number of mutations and truncations employed in this study, generating single-copy rescue lines for all transgenic strains would be a significant undertaking. On average, we need to microinject 50-100 worms to obtain one single-copy line, whereas injecting only 5-10 worms allows us to generate at least three independent extrachromosomal arrays. Based on our previous work, we found that the synaptic transmission levels are comparable between various extrachromosomal rescue arrays of unc13 and their single-copy rescue lines (e.g., UNC-13L, UNC-13S, UNC-13R, UNC-13ΔC2A, UNC-13ΔC2B, etc.). In future studies, we aim to use single-copy expression or CRISPRbased methods to induce deletions or mutations in various synaptic proteins.

      Finally, the discussion is somewhat diffused. Streamlining the text to focus on the most direct connections would help readers pinpoint the key conclusions and open questions.

      We appreciate the comment. As suggested, we have refined the discussion section. Specifically, we have removed the last part of the discussion (Functional roles of the linkers in UNC-13).  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Clarify the "Gain-of-Function" State. Provide stronger justification or explicit discussion of whether all manipulations that enhance SV release truly correspond to the same mechanistic state or if multiple conformational states might be at play.

      The “gain-of-function” state in this manuscript refers to a specific conformational status of UNC-13 that enhances synaptic vesicle (SV) release probability (both spontaneous and evoked) as a result of mutations (HK and DN) in the C1 and C2B domains. This effect is observed across multiple UNC-13 isoforms, including UNC-13L, UNC-13S, and UNC-13R. Prior studies from our group and others have demonstrated that C1 and C2B exhibit conserved functions in regulating synaptic transmission (Li et al., 2019, Cell Reports; Liu et al., 2021, Cell Reports; Michelassi et al., 2017, Neuron), supporting the idea that these domains share a common mechanism for modulating SV release. Given that C1 and C2B act as a functional unit (Michelassi et al., 2017, Neuron; and this study), we define all synaptic states induced by the dysfunction of these two domains as the "gain-of-function" mode.

      However, it is important to note that this classification does not apply to high-release probability states induced by mutations in other domains.

      The concept of a gain-of-function state due to C1 and C2B dysfunction has been previously proposed in studies of Munc13. Basu et al. (2007, Journal of Neuroscience) demonstrated that the H567K mutation in Munc13-1 C1 increases both spontaneous and evoked release probability, leading to a gain-of-function mode. Similarly, work from the Südhof group showed that KW and DN mutations in Munc13-1 C2B also enhance release probability, thereby inducing a gain-of-function state (Shin et al., 2010, Nature Structural & Molecular Biology). Our recent findings further support this idea, showing that UNC-13 C2B D3,4N (Li et al., 2019, Cell Reports; Liu et al., 2021, Cell Reports; Michelassi et al., 2017, Neuron) and the newly identified D1-5N mutation (this study) significantly elevate SV release, consistent with the D1,2N mutations reported by Shin et al.

      Overall, our study integrates and extends previous findings, providing strong evidence that the C1 and C2B domains function as a regulatory switch between a basal physiological mode, a gain-of-function mode (enhanced release), and a loss-of-function mode (impaired release). This framework advances our understanding of how C1 and C2B dysfunction affects synaptic transmission and plasticity.

      (2) Add comparisons to wild-type UNC-13L: When presenting data for deletions/mutants as "controls," include a visual reference (e.g., dashed line in figures) showing wild-type UNC13L levels. This will help readers see whether each construct is above or below the normal activity baseline.

      As suggested, a dashed line showing the level of UNC-13L has been added to the bar graphs of all evoked EPSCs. The functional switch model is well supported by the results of the evoked EPSCs.

      (3) Mutant and wild-type UNC-13 colocalization analysis: Demonstrating whether each mutant localizes robustly to synapses, in comparison to wild-type UNC-13, would bolster the interpretation of electrophysiological changes. If the authors have these data, adding them would address the possibility of mislocalization.

      We agree with the reviewer that there would be value to address the possibility of mislocalization. However, in our experience working with UNC-13 mutant colocalization, we have found that neither deleting the X, C1 and C2B domains in UNC-13L  nor deleting C1 and C2B domain in UNC-13MR or UNC-13R altered the synaptic colocalization with the active zone protein UNC-10/RIM (Li 2019, Liu 2021), suggesting that C1 and C2B domains in UNC-13 are not involved in the regulation of protein localization. Thus, the mutations in the C1 and C2B domains are unlikely leading to protein mislocalization in the synaptic region.

      (4) If possible, adding analysis using single-copy transgenes to confirm that extrachromosomal array expression variability does not qualitatively change the conclusions.

      We strongly agree with the reviewer that single-copy transgenes would provide more stable protein expression levels and further consolidate our conclusions. However, several factors give us confidence that the extrachromosomal array rescue approach does not introduce significant variability in our results: First, our prior research has shown that SV release levels are generally comparable between extrachromosomal arrays carrying various unc13 transgenes and their corresponding single-copy rescue lines (e.g., UNC-13L, UNC-13S, UNC-13R, UNC-13ΔC2A, and UNC-13ΔC2B). Second, the major conclusions in this study are drawn from highly consistent and robust changes in SV release between different rescue lines (e.g., UNC-13L<sup>HK+DN</sup> vs UNC-13L<sup>DN</sup>; UNC-13S<sup>HK+DN</sup> vs UNC-13S<sup>HK</sup> or UNC-13S<sup>DN</sup> ). Third, our imaging data indicate that the protein levels are indistinguishable between different unc-13 rescue arrays carrying C1 and C2B mutations, further supporting the validity of our findings.

      Additionally, due to our recent relocation to a new institute, we are still in the process of setting up our microinjection system. Generating single-copy transgenes for all the extrachromosomal arrays used in this study would require significant time. We appreciate the reviewer’s understanding of our current situation. For our future studies regarding unc-13 and other synaptic proteins, we will prefer to use single-copy expression rather than extrachromosomal arrays.

      (5) Reduce the length and speculation in the Discussion. A concise discussion that focuses on the most direct implications of the present findings will help improve the readability of this paper.

      We appreciate the comment. As suggested, we have refined the discussion section.

      Specifically, the last part of the discussion (Functional roles of the linkers in UNC-13) was removed.

      (6) Minor formatting detail: In Figure 5C (left panel), adjust the y-axis label to ensure it aligns properly and improves clarity.

      We appreciate the reviewer’s suggestion and have adjusted the y-axis label accordingly in the revised version (see revised Figure 5).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study introduces a useful deep learning-based algorithm that tracks animal postures with reduced drift by incorporating transformers for more robust keypoint detection. The efficacy of this new algorithm for single-animal pose estimation was demonstrated through comparisons with two popular algorithms. However, the analysis is incomplete and would benefit from comparisons with other state-of-the-art methods and consideration of multi-animal tracking.

      First, we would like to express our gratitude to the eLife editors and reviewers for their thorough evaluation of our manuscript. ADPT aims to improve the accuracy of body point detection and tracking in animal behavior, facilitating more refined behavioral analyses. The insights provided by the reviewers have greatly enhanced the quality of our work, and we have addressed their comments point-by-point.

      In this revision, we have included additional quantitative comparisons of multi-animal tracking capabilities between ADPT and other state-of-the-art methods. Specifically, we have added evaluations involving homecage social mice and marmosets to comprehensively showcase ADPT’s advantages from various perspectives. This additional analysis will help readers better understand how ADPT effectively overcomes point drift and expands its applicability in the field.

      Reviewer #1:

      In this paper, the authors introduce a new deep learning-based algorithm for tracking animal poses, especially in minimizing drift effects. The algorithm's performance was validated by comparing it with two other popular algorithms, DeepLabCut and LEAP.The accessibility of this tool for biological research is not clearly addressed, despite its potential usefulness. Researchers in biology often have limited expertise in deep learning training, deployment, and prediction. A detailed, step-by-step user guide is crucial, especially for applications in biological studies.

      We appreciate the reviewers' acknowledgment of our work. While ADPT demonstrates superior performance compared to DeepLabCut and SLEAP, we recognize that the absence of a user-friendly interface may hinder its broader application, particularly for users with a background solely in biology. In this revision, we have enhanced the command-line version of the user tutorial to provide a clear, step-by-step guide. Additionally, we have developed a simple graphical user interface (GUI) to further support users who may not have expertise in deep learning, thereby making ADPT more accessible for biological research.

      The proposed algorithm focuses on tracking and is compared with DLC and LEAP, which are more adept at detection rather than tracking.

      In the field of animal pose estimation, the distinction between detection and tracking is often blurred. For instance, the title of the paper "SLEAP: A deep learning system for multi-animal pose tracking" refers to "tracking," while "detection" is characterized as "pose estimation" in the body text. Similarly, "Multi-animal pose estimation, identification, and tracking with DeepLabCut" uses "tracking" in the title, yet "detection" is also mentioned in the pose estimation section. We acknowledge that referencing these articles may have contributed to potential confusion.

      To address this, we have clarified the distinction between "tracking" and "detection" Results section under " Anti-drift pose tracker." (see lines 118-119). In this paper, we now explicitly use “track” to refer to the tracking of all body points or poses of an individual, and “detect” for specific keypoints.

      Reviewer #1 recommendations:

      (1) DLC and LEAP are mainly good in detection, not tracking. The authors should compare their ADPT algorithm with idtracker.ai, ByteTrack, and other advanced tracking algorithms, including recent track-anything algorithms.

      (2) DeepPoseKit is outdated and no longer maintained; a comparison with the T-REX algorithm would be more appropriate.

      We appreciate the reviewer's suggestion for a more comprehensive comparison and acknowledge the importance of including these advanced tracking algorithms. However, we have not yet found suitable publicly available datasets for such comparative testing. We appreciate this insight and will consider incorporating T-REX into future comparisons.

      (3) The authors primarily compared their performance using custom data. A systematic comparison with published data, such as the dataset reported in the paper "Multi-animal pose estimation, identification, and tracking with DeepLabCut," is necessary. A detailed comparison of the performances between ADPT and DLC is required.

      In the previous version of our manuscript, we included the SLEAP single-fly public dataset and the OMS_dataset from OpenMonkeyStudio for performance comparisons. We recognize that these datasets were not comprehensive. In this revision, we have added the marmoset dataset from "Multi-animal pose estimation, identification, and tracking with DeepLabCut" and a customized homecage social mice dataset to enhance our comparative analysis of multi-animal pose estimation performance. Our comprehensive comparison reveals that ADPT outperforms both DLC and SLEAP, as discussed in the Results section under "ADPT can be adapted for end-to-end pose estimation and identification of freely social animals.". (Figure 1, see lines 303-332)

      (4) Given the focus on biological studies, an easy-to-use interface and introduction are essential.

      In this revision, we have not only developed a GUI for ADPT but also included a more detailed tutorial. This can be accessed at https://github.com/tangguoling/ADPT-TOOLBOX

      Reviewer #2:

      The authors present a new model for animal pose estimation. The core feature they highlight is the model's stability compared to existing models in terms of keypoint drift. The authors test this model across a range of new and existing datasets. The authors also test the model with two mice in the same arena. For the single animal datasets the authors show a decrease in sudden jumps in keypoint detection and the number of undetected keypoints compared with DeepLabCut and SLEAP. Overall average accuracy, as measured by root mean squared error, generally shows similar but sometimes superior performance to DeepLabCut and better performance compared to SLEAP. The authors confusingly don't quantify the performance of pose estimation in the multi (two) animal case instead focusing on detecting individual identity. This multi-animal model is not compared with the model performance of the multi-animal mode of DeepLabCut or SLEAP.

      We appreciate the reviewer's thoughtful assessment of our manuscript. Our study focuses on addressing the issue of keypoint drift prevalent in animal pose estimation methods like DeepLabCut and SLEAP. During the model design process, we discovered that the structure of our model also enhances performance in identifying multiple animals. Consequently, we included some results related to multi-animal identity recognition in our manuscript.

      In recent developments, we are working to broaden the applicability of ADPT for multi-animal pose estimation and identity recognition. Given that our manuscript emphasizes pose estimation, we have added a comparison of anti-drift performance in multi-animal scenarios in this revision. This quantifies ADPT's capability to mitigate drift in multi-animal pose estimation.

      Using our custom Homecage social mice dataset, we compared ADPT with DeepLabCut and SLEAP. The results indicate that ADPT achieves more accurate anti-drift pose estimation for two mice, with superior keypoint detection accuracy. Furthermore, we also evaluated pose estimation accuracy on the publicly available marmoset dataset, where ADPT outperformed both DeepLabCut and SLEAP. These findings are discussed in the Results section under "ADPT can be adapted for end-to-end pose estimation and identification of freely social animals."

      The first is a tendency to make unsubstantiated claims that suggest either model performance that is untested or misrepresents the presented data, or suggest excessively large gaps in current SOTA capabilities. One obvious example is in the abstract when the authors state ADPT "significantly outperforms the existing deep-learning methods, such as DeepLabCut, SLEAP, and DeepPoseKit." All tests in the rest of the paper, however, only discuss performance with DeepLabCut and SLEAP, not DeepPoseKit. At this point, there are many animal pose estimation models so it's fine they didn't compare against DeepPoseKit, but they shouldn't act like they did.

      We appreciate the reviewer's feedback regarding unsubstantiated claims in our manuscript. Upon careful review, we acknowledge that our previous revisions inadvertently included statements that may misrepresent our model's performance. In particular, we have revised the abstract to eliminate the mention of DeepPoseKit, as our comparisons focused exclusively on DeepLabCut and SLEAP.

      In addition to this correction, we have thoroughly reviewed the entire manuscript to address other instances of ambiguity and ensure that our claims are well-supported by the data presented. Thank you for bringing this to our attention; we are committed to maintaining the integrity of our claims throughout the paper.

      In terms of making claims that seem to stretch the gaps in the current state of the field, the paper makes some seemingly odd and uncited statements like "Concerns about the safety of deep learning have largely limited the application of deep learning-based tools in behavioral analysis and slowed down the development of ethology" and "So far, deep learning pose estimation has not achieved the reliability of classical kinematic gait analysis" without specifying which classical gait analysis is being referred to. Certainly, existing tools like DeepLabCut and SLEAP are already widely cited and used for research.

      In this revision, we have carefully reviewed the entire manuscript and addressed the instances of seemingly odd and unsubstantiated claims. Specifically, we have revised the statements "largely limited" to "limited" to ensure accuracy and clarity. Additionally, we thoroughly reviewed the citation list to ensure proper attribution, incorporating references such as "A deep learning-based toolbox for Automated Limb Motion Analysis (ALMA) in murine models of neurological disorders" to better substantiate our claims and provide a clearer context.

      We have also added an additional section to comprehensively discuss the applications of widely-used tools like DeepLabCut and SLEAP in behavioral research. This new section elaborates on the challenges and limitations researchers encounter when applying these methods, highlighting both their significant contributions and the areas where improvements are still needed.

      The other main weakness in the paper is the validation of the multi-animal pose estimation. The core point of the paper is pose estimation and anti-drift performance and yet there is no validation of either of these things relating to multi-animal video. All that is quantified is the ability to track individual identity with a relatively limited dataset of 10 mice IDs with only two in the same arena (and see note about train and validation splits below). While individual tracking is an important task, that literature is not engaged with (i.e. papers like Walter and Couzin, eLife, 2021: https://doi.org/10.7554/eLife.64000) and the results in this paper aren't novel compared to that field's state of the art. On the other hand, while multi-animal pose estimation is also an important problem the paper doesn't engage with those results either. The two methods already used for comparison in the paper, SLEAP and DeepPoseKit, already have multi-animal models and multi-animal annotated datasets but none of that is tested or engaged with in the paper. The paper notes many existing approaches are two-step methods, but, for practitioners, the difference is not enough to warrant a lack of comparison.

      We appreciate the reviewer's insights regarding the validation of multi-animal pose estimation in our paper. While our primary focus has been on pose estimation and anti-drift performance, we recognize the importance of validating these aspects within the context of multi-animal videos.

      In this revision, we have included a comparison of ADPT's anti-drift performance in multi-animal pose estimation, utilizing our custom Homecage social mouse dataset (Figure 1A). Our findings indicate that ADPT achieves more accurate pose estimation for two mice while significantly reducing keypoint drift, outperforming both DeepLabCut and SLEAP. (see lines 311-322). We trained each model three times, and this figure presents the results from one of those training sessions. We calculated the average RMSE between predictions and manual labels, demonstrating that ADPT achieved an average RMSE of 15.8 ± 0.59 pixels, while DeepLabCut (DLC) and SLEAP recorded RMSEs of 113.19 ± 42.75 pixels and 94.76 ± 1.95 pixels, respectively (Figure 1C). ADPT achieved an accuracy of 6.35 ± 0.14 pixels based on the DLC evaluation metric across all body parts of the mice, while DLC reached 7.49 ± 0.2 pixels (Figure 1D). ADPT achieved 8.33 ± 0.19 pixels using the SLEAP evaluation Metric across all body parts of the mice, compared to SLEAP’s 9.82 ± 0.57 pixels (Figure 1E).

      Furthermore, we have conducted pose estimation accuracy evaluations on the publicly available marmoset dataset from DeepLabCut, where ADPT also demonstrated superior performance compared to DeepLabCut and SLEAP. These results can be found in the "ADPT can be adapted for end-to-end pose estimation and identification of freely social animals" section of the Results. (see lines 323-329)

      We acknowledge the existing literature on multi-animal tracking, such as the work by Walter and Couzin (2021). While individual tracking is crucial, our primary focus lies in the effective tracking of animal poses and minimizing drift during this process. This dual emphasis on pose tracking and anti-drift performance distinguishes our work and aligns with ongoing advancements in the field. Engaging with relevant literature, highlights the importance of contextualizing our results within the broader tracking literature, demonstrating that while our findings may overlap with existing methods, the unique focus on improving tracking stability and reducing drift presents valuable contributions to the field. Thank you for your valuable feedback, which has helped us improve the robustness of our manuscript.

      The authors state that "The evaluation of our social tracking capability was performed by visualizing the predicted video data (see supplement Videos 3 and 4)." While the authors report success maintaining mouse ID, when one actually watches the key points in the video of the two mice (only a single minute was used for validation) the pose estimation is relatively poor with tails rarely being detected and many pose issues when the mice get close to each other.

      We acknowledge that there are indeed challenges in pose estimation, particularly when the two mice get close to each other, leading to tracking failures and infrequent detection of tails in the predicted videos. The reasons for these issues can be summarized as follows:

      Lack of Training Data from Real Social Scenarios: The training data used for the social tracking assessment were primarily derived from the Mix-up Social Animal Dataset, which does not fully capture the complexities of real social interactions. In future work, we plan to incorporate a blend of real social data and the Mix-up data for model training. Specifically, we aim to annotate images where two animals are in close proximity or interacting to enhance the model's understanding of genuine social behaviors.

      Challenges in Tail Tracking in Social Contexts: Tracking the tails of mice in social situations remains a significant challenge. To validate this, we have added an assessment of tracking performance in real social settings using homecage data. Our findings indicate that using annotated data from real environments significantly improves tail tracking accuracy, as demonstrated in the supplementary video.

      We appreciate your feedback, which highlights critical areas for improvement in our model.

      Finally, particularly in the methods section, there were a number of places where what was actually done wasn't clear.

      We have carefully reviewed and revised the corresponding parts to clarify the previously incomprehensible statements. Thank you for your valuable feedback, which has helped enhance the clarity of our methods.

      For example in describing the network architecture, the authors say "Subsequently, network separately process these features in three branches, compute features at scale of one-fourth, one-eight and one-sixteenth, and generate one-eight scale features using convolution layer or deconvolution layer." Does only the one-eight branch have deconvolution or do the other branches also?

      We apologize for the confusion this has caused. Upon reviewing our manuscript, we identified an error in the diagram. In the revised version, we have clarified that the model samples feature maps at multiple resolutions and ultimately integrates them at the 1/8 resolution for feature fusion. Specifically, the 1/4 feature map from ResNet50's stack 2 is processed through max-pooling and convolution to generate a 1/8 feature map. Additionally, the 1/4 feature map from ResNet50's stack 2 is also transformed into a 1/8 feature map using a convolution operation with a stride of 2. Finally, both the input and output of the transformer are at the 1/16 resolution, which can be trained on a 2080Ti GPU. The 1/16 feature map is then upsampled to produce the final 1/8 feature map. We have updated the manuscript to reflect these changes, and we also modified the model architecture diagram for better clarity.

      Similarly, for the speed test, the authors say "Here we evaluate the inference speed of ADPT. We compared it with DeepLabCut and SLEAP on mouse videos at 1288 x 964 resolution", but in the methods section they say "The image inputs of ADPT were resized to a size that can be trained on the computer. For mouse images, it was reduced to half of the original size." Were different image sizes used for training and validation? Or Did ADPT not use 1288 x 964 resolution images as input which would obviously have major implications for the speed comparison?

      For our inference speed evaluation, all models, including ADPT, used images with a resolution of 1288 x 964. In ADPT's processing pipeline, the first layer is a resizing layer designed to compress the images to a scale determined by the global scale parameter. For the mouse images, we set the global scale to 0.5, allowing our GPU to handle the data at that resolution during transformer training.

      We recorded the time taken by ADPT to process the entire 15-minute mouse video, which included the time taken for the resizing operation, and subsequently calculated the frames per second (FPS). We have clarified this process in the manuscript, particularly in the "Network Architecture" section, where we specify: "Initially, ADPT will resize the images to a390 scale (a hyperparameter, consistent with the global scale in the DLC configuration)."

      Similarly, for the individual ID experiments, the authors say "In this experiment, we used videos featuring different identified mice, allocating 80% of the data for model training and the remaining 20% for accuracy validation." Were frames from each video randomly assigned to the training or validation sets? Frames from the same video are very correlated (two frames could be just 1/30th of a second different from each other), and so if training and validation frames are interspersed with each other validation performance doesn't indicate much about performance on more realistic use cases (i.e. using models trained during the first part of an experiment to maintain ids throughout the rest of it.)

      In our study, we actually utilized the first 80% of frames from each video for model training and the remaining 20% for testing the model's ID tracking accuracy. We have revised the relevant description in the manuscript to clarify this process. The updated description can be found in the "Datasets" section under "Mouse Videos of Different Individuals."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to uncover molecular and structural details underlying the broad substrate specificity of glycosaminoglycan lyases belonging to a specific family (PL35). They determined the crystal structures of two such enzymes, conducted in vitro enzyme activity assays, and a thorough structure-guided mutagenesis campaign to interrogate the role of specific residues. They made progress towards achieving their aims but I see significant holes in data that need to be determined and in the authors' analyses.

      Impact on the field:

      I expect this work will have a limited impact on the field, although, with additional experimental work and better analysis, this paper will be able to stand on its own as a solid piece of structure-function analysis.

      Strengths:

      The major strengths of the study were the combination of structure and enzyme activity assays, comprehensive structural analysis, as well as a thorough structure-guided mutagenesis campaign.

      Weaknesses:

      There were several weaknesses, particularly:

      (1) The authors claim to have done an ICP-MS experiment to show Mn2+ binds to their enzyme but did not present the data. The authors could have used the anomalous scattering properties of Mn2+ at the synchrotron to determine the presence and location of this cation (i.e. fluorescence spectra, and/or anomalous data collection at the Mn2+ absorption peak).

      Thank you for your kind comment and suggestion. Many studies utilized ICP-MS for the detection of metal ions within proteins (doi: 10.1016/j.jbc.2023.103047; doi: 10.1074/jbc.RA119.011790), so we utilized this method to determine the type of atoms within GAGases. In the revised manuscript, the data of ICP-MS experiment has been presented in “Supplemental Table S1”

      (2) The authors have an over-reliance on molecular docking for understanding the position of substrates bound to the enzyme. The docking analysis performed was cursory at best; Autodock Vina is a fine program but more rigorous software could have been chosen, as well we molecular dynamics simulations. As well the authors do not use any substrate/product-bound structures from the broader PL enzyme family to guide the placement of the substrates in the GAGases, and interpret the molecular docking models.

      Thank you for your kind comments. The interaction between the enzyme and ligand should be confirmed by resolving the structure of enzyme-ligand complex. Unfortunately, we tried to prepare the co-crystals of GAGases with various oligosaccharide substrates but ultimately failed. Thus, we tried to use docking to explain the catalytic mechanism of polysaccharide lyases using Autodock Vina although this method may be questionable. In the revised manuscript, we predicted the substrate binding site of GAGase II using Caver Web 1.2 and performed molecular docking near the substrate binding site simultaneously using Molecular Operating Environment (MOE) to verify the accuracy of the docking results (Figure 6, Supplemental Figure S4). In addition, a series of enzyme-substrate complex structures of identified PL family enzymes with structural similarities to the GAGases are showed in Supplemental Figure S2, and the positions of the catalytic cavities and the substrate binding modes are similar to those of the molecular docking results, which may also corroborate the referability of our molecular docking results in another aspect.

      (3) The conclusion that the structures of GAGase II and VII are most similar to the structures of alginate lyases (Table 2 data), and the authors' reliance on DALI, are both questioned. DALI uses a global alignment algorithm, which when used for multi-domain enzymes such as these tends to result in sub-optimal alignment of active site residues, particularly if the active site is formed between the two domains as is the case here. The authors should evaluate local alignment methods focused on the optimization of the superposition of a single domain; these methods may result in a more appropriate alignment of the active site residues and different alignment statistics. This may influence the overall conclusion of the evolutionary history of these PL35 enzymes.

      Thank you for your kind question. As your suggestion, multiple structural alignment assays were carried out for the (α/α)<sub>n</sub> toroid and the antiparallel β-sheet domain, respectively, based on the structures of GAGs/alginate lyases from PL5, PL8, PL12, PL15, PL17, PL21, PL23, PL36, PL38 and PL39 families. The results showed that the overall structure of GAGases is more similarity to that of PL15, PL17 and PL39 family alginate lyases, which have an (α/α)<sub>6</sub> toroid and an antiparallel β-sheet domain (Table 3). In terms of the toroid and antiparallel β-sheet domains, most of them have an (α/α)<sub>6</sub> toroid and an antiparallel β-sheet as shown in Table 3. We also noticed that GAGases possess such a (α/α)<sub>6</sub> toroid structure rather than a (α/α)<sub>7</sub> toroid structure, and revised the relevant statement in the manuscript.

      (4) The data on the GAGase III residue His188 is not well interpreted; substitution of this residue clearly impacts HA and HS hydrolysis as well. The data on the impact on alginate hydrolysis is weak, which could be due to the fact that the WT enzyme has poor activity against alginate to start with.

      Thank you very much for your helpful comments and questions. To verify your suggestion that the weak impact of alginate hydrolysis could be due to poor activity of wild type GAGase III, we degraded alginate using different enzyme concentrations (3 to 30 μg) and analyzed the degradation products. The results showed that the alginate-degrading activity of GAGase III-H188A and GAGase III-H188N was abolished, even at a quite high ratio of the mutated enzyme to substrate such as 30 μg enzyme to 30 μg substrate (Supplemental Figure S3A), while their GAG-degrading activity was only partially affected, indicating that this residue plays a more important role for the digestion of alginate than other substrates. Unfortunately, we were unable to confer the ability to GAGase III through the mutation of N191H in GAGase II. Therefore, we suggest that His<sup>188</sup> play a key role in the specificity of alginate degradation by GAGase III, but that other determinants also contribute to this process. We will try more methods to obtain the structure of enzyme-substrate co-crystals and explain its substrate-selective mechanism in future studies.

      (5) The authors did not use the words "homology", "homologous", or "homolog" correctly (these terms mean the subjects have a known evolutionary relationship, which may or may not be known in the contexts the authors used these targets); the words "similarity" and "similar" are recommended to be used instead.

      Thank you for your helpful suggestions. We have revised the relevant part of the description in the manuscript.

      (6) The authors discuss a "shorter" cavity in GAGases, which does not make sense and is not supported by any figure or analysis. I recommend a figure with a surface representation of the various enzymes of interest, with dimensions of the cavity labeled (as a supplemental figure). The authors also do not specifically define what subsites are in the context of this family of enzymes, nor do they specifically label or indicate the location of the subsites on the figures of the GAGase II and IV enzyme structures.

      Thank you for your helpful suggestions. Figures (Supplemental Figure S2) with surface representations of the GAGase II and some structurally similar GAGs/alginate lyases with the dimensions of the cavity labeled, were added to the supplementary data as you suggested. Considering the correlation between enzyme specificity and substrate binding sites, we speculated that a shorter substrate binding cavity might allow the enzyme to accommodate a wider variety of substrates, resulting in a smaller restriction of the catalytic cavity to substrate binding, although this speculation needs to be verified by the resolution of the crystal structure of the enzyme-substrate complexes.

      Reviewer #2 (Public review):

      Summary:

      Wei et al. present the X-ray crystallographic structures of two PL35 family glycosaminoglycan (GAG) lyases that display a broad substrate specificity. The structural data show that there is a high degree of structural homology between these enzymes and GAGases that have previously been structurally characterized. Central to this are the N-terminal (α/α)7 toroid domain and the C-terminal two-layered β-sheet domain. Structural alignment of these novel PL35 lyases with previously deposited structures shows a highly conserved triplet of residues at the heart of the active sites. Docking studies identified potentially important residues for substrate binding and turnover, and subsequent site-directed mutagenesis paired with enzymatic assays confirmed the importance of many of these residues. A third PL35 GAGase that is able to turn over alginate was not crystallized, but a predicted model showed a conserved active site Asn was mutated to a His, which could potentially explain its ability to act on alginate. Mutation of the His into either Ala or Asn abrogated its activity on alginate, providing supporting evidence for the importance of the His. Finally, a catalytic mechanism is proposed for the activity of the PL35 lyases. Overall, the authors used an appropriate set of methods to investigate their claims, and the data largely support their conclusions. These results will likely provide a platform for further studies into the broad substrate specificity of PL35 lyases, as well as for studies into the evolutionary origins of these unique enzymes

      Strengths:

      The crystallographic data are of very high quality, and the use of modern structural prediction tools to allow for comparison of GAGase III to GAGase II/GAGase VII was nice to see. The authors were comprehensive in their comparison of the PL35 lyases to those in other families. The use of molecular docking to identify key residues and the use of site-directed mutagenesis to investigate substrate specificity was good, especially going the extra distance to mutate the conserved Asn to His in GAGase II and GAGase VII.

      Weaknesses:

      The structural models simply are not complete. A cursory look at the electron density and the models show that there are many positive density peaks that have not had anything modelled into them. The electron density also does not support the placement of a Mn2+ in the model. The authors indicate that ICP-MS was done to identify the metal, but no ICP-MS data is presented in the main text or supplementary. I believe the authors put too much emphasis on the possibility of GAGase III representing an evolutionary intermediate between GAG lyases and alginate lyases based on a single Asn to His mutation in the active site, and I don't believe that enough time was spent discussing how this "more open and shorter" catalytic cavity would necessarily mean that the enzyme could accommodate a broader set of substrates. Finally, the proposed mechanism does not bring the enzyme back to its starting state.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) The number of significant digits used in Table 1 and Figure 3 legend are not justified. The authors should use a maximum of 2 significant digits.

      Thank you for your kind suggestion. We have verified the relevant data and retained two significant digits.

      (2) The authors should use the words "mutant" or "mutation" only when discussing DNA, but when discussing protein, the words "variant" and "substitution" should be used instead as these are more appropriate.

      Thank you for your helpful suggestions. We have revised the relevant description in the manuscript as you suggested.

      (3) Lines 102-110 are a long, run-on sentence that should be split into shorter sentences. Similarly, lines 367-378 should be split into shorter sentences.

      Thank you for your suggestions. In the revised manuscript, the long sentences in lines 102-110 and 367-378 have been rewritten into shorter ones.

      (4) Lines 174-175: His, Tyr, Glu, and Trp are not positively charged residues and this wording should be changed.

      Thank you for your suggestions. We have revised the relevant description in the manuscript as you suggested.

      (5) Lines 423-426 require a reference.

      Thank you for your suggestion. We have provided the reference at the right position and revised the relevant description in the manuscript as you suggested.

      (6) Grammar/language:

      -line 90 - change "should emerge" to "likely emerged"

      -line 145 - delete "Finally"

      -line 264 - delete "their"

      -line 265 - delete "active sites"

      -line 265-266 - change to "To confirm this hypothesis, site-directed mutagenesis followed by enzyme activity assay was performed"

      -line 311 - change "residue in the catalytic cavity of GAGase III, which.." to "residue in its catalytic cavity, which..."

      -line 318 - change "affect" to "affected"

      -line 323 - change to "degrading activity of GAGase II remains to be determined outside of the His188 residue"

      -line 345 - delete "assays"

      -line 359 - change to "evidence"

      -line 397 - change "folds" to "3D fold"

      -line 420 - change to "share similar catalytic sites"

      -lines 411, 433 - change "conversed" to "conserved"

      -line 441 - change to "Mutational analysis showed that the His188.."

      -line 450 - delete "which"

      Thank you for your suggestions. Grammatical errors in the revised manuscript have been corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Major Concerns

      The electron density in your model clearly does not support the placement of a Mn ion. In the GAGase II structure, the placement of the Mn and the placement of waters around it still results in two density peaks of > 12 rmsd. The manuscript suggests that ICP-MS was done but the results of this are not shown anywhere. Please include your ICP-MS data. I see the structures have already been deposited, and if they have been deposited unchanged, please see if you can modify them to actually finish building the models. I don't find your data in Figure 2B particularly convincing that Mn is necessarily important for activity.

      Thank you for your kind comments. As we known, ICP-MS is a common method used for the detection of metal ions within proteins (doi: 10.1016/j.jbc.2023.103047; doi: 10.1074/jbc.RA119.011790), and thus we utilized it to determine the type of atoms within GAGases in this study. In the revised manuscript, the data of ICP-MS experiment has been presented in “Supplemental Table S1”, and the data clearly showed that the content of Mn<sup>2+</sup> rather than others in test sample is much higher than that in the negative control, suggesting the involvement of Mn<sup>2+</sup> in the protein. We agree that the addition of Mn<sup>2+</sup> does not show very strong promotion to the activity of GAGase II just like other tested metal ions, but the addition of EDTA significantly inhibited the enzyme activity (Figure 2), indicating that metal ion such as Mn<sup>2+</sup> is necessary for the function of GAGases. Regarding the role of metal ion, whether it participates in the catalytic reaction or only stabilize the structure of enzyme remains to be further explored in our further study.

      Minor Concerns

      (1) Please include CC1/2 in your Table 1.

      Thank you for your kind suggestions. CC1/2 parameters have been added in the revised manuscript (Table 1).

      (2) If possible please include SDS-PAGE gel images of your purified proteins. Particularly for the point mutations. Ideally, you would have done SEC on your mutants to show that the reduction in activity is not due to aggregation/misfolding, but at the very least I would to see that you have similar levels of purity.

      Thank you for your kind suggestions. As your suggestion, we have added SDS-PAGE gel images of purified GAGase II, GAGase III, GAGase VII, and their mutant enzymes to the supplementary data. As shown in Figure S5, site-directed mutagenesis did not affect the soluble expression levels of GAGase II, GAGase III or GAGase VII, indicating that the reduction in activity is not due to aggregation or misfolding. Due to the large number of variants, we used crude enzyme for the activity assay of substrate binding sites, while for some catalytic key residues, we purified the corresponding mutant enzymes and then verified their activities by HPLC.

      (3) When referring to your structural predictions, it is not appropriate to say that you used Robetta. Your reference is correct though - you should say that the structures were predicted using RoseTTAfold.

      Thank you for your helpful suggestions. We have revised the relevant description in the manuscript.

      (4) If possible expand on how the shorter/more open active site cavity would result in broader substrate specificity.

      Thank you for your kind comment. In the revised manuscript, figures (Supplemental Figure S2) with surface representations of the GAGase II and some representatively structurally similar GAGs/alginate lyases, with the dimensions of the cavity labeled, were added to the supplementary data. Considering the correlation between enzyme specificity and substrate binding sites, we speculated that a shorter substrate binding cavity might allow the enzyme to accommodate a wider variety of substrates, resulting in a smaller restriction of the catalytic cavity to substrate binding. However, unfortunately, we did not succeed in obtaining co-crystals of GAGases with any of the substrates. We will try to explain the mechanism of substrate selectivity in future studies by culturing and resolving crystals of its enzyme substrate complex or otherwise.

      (5) I would put less emphasis on His188 in GAGase III being a strong indicator that this protein represents an evolutionary intermediate between alginate lyases and GAGases.

      Thank you for your comment. The His<sup>188</sup> residue, which is unique compared to other GAGases, is essential for the alginate-degrading activity of GAGase III. Regarding why GAGases are thought to represent a possible evolutionary intermediate between alginate lyases and GAG lyases, phylogenetic analysis demonstrated that GAGases show considerable homology with some identified GAG lyases and alginate lyases (DOI: 10.1016/j.jbc.2024.107466). The similarity in primary structure between some GAG lyases, alginate lyases, and GAGases suggests structural similarities, which are further supported by this study. As structure determines function, structural similarity is often used as a key criterion when studying the evolution of proteins, the GAGase III, which shows significant GAGs and alginate-degrading activity, support for this speculation. Of course, in this study, our analysis of the evolutionary relationship between GAGases and identified GAG lyases and alginate lyases, based on structural comparison, is an attempt using existing methods. The conclusions we have drawn remain a hypothesis that still requires further evidence to support and validate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript under review investigates the role of periosteal stem cells (P-SSC) in bone marrow regeneration using a whole-bone subcutaneous transplantation model. While the model is somewhat artificial, the findings were interesting, suggesting the migration of periosteal stem cells into the bone marrow and their potential to become bone marrow stromal cells. This indicates a significant plasticity of P-SSC consistent with previous reports using fracture models (Cell Stem Cell 29:1547, Dev Cell 59:1192).

      Major Concerns

      (1) The authors assert that the periosteal layer was completely removed in their model, which is crucial for their conclusions. To substantiate this claim, it is recommended that the authors provide evidence of the successful removal of the entire periosteal stem cell (P-SSC) population. A colony-forming assay, with and without periosteal removal, could serve as a suitable method to demonstrate this.

      We are grateful to the reviewer for this valuable suggestion. The objective of this experiment was to demonstrate that periosteal ablation impairs bone marrow regeneration, a finding that is supported by our results. We expect that ablation of the periosteum would be associated with only a partial decrease in CFU-F activity, given the presence of MSCs in the bone and in the endosteal region of the bone marrow. Therefore, CFU-F assays would be difficult to interpret in this setting. In view of the phenotype obtained, providing proof of concept of the importance of the periosteum, we do not believe that further experiments would strengthen the level of proof of this experiment.

      (2) The observation that P-SSCs do not express Kitl or Cxcl12, while their bone marrow stromal cell (BM-MSC) derivatives do, is a key finding. To strengthen this conclusion, the authors are encouraged to repeat the experiment using Cxcl12 or Scf reporter alleles. Immunofluorescence staining that confirms the migration of periosteal cells and their transformation into Cxcl12- or Scf-reporter-positive cells would significantly enhance the paper's key conclusion.

      Transplantation of periosteum isolated from Cxcl12 or Scf into WT bones is an excellent suggestion. Indeed, this experiment would confirm (1) the migration of periosteal SSC and (2) the expression of Cxcl12 and Scf by BM-MSCs derived from the periosteum .However, it should be noted that the current limitations in terms of available resources preclude the execution of these experiments. Moreover, the use of the PostnCre<sup>ER</sup>;Tmt mice represent the optimal approach for tracking and specifically isolating BM-MSCs derived from the periosteum. The expression of Cxcl12 and Scf by BM-MSCs derived from the periosteum has been demonstrated in 2 distinct experimental models (Figures 5 and 6).

      (3) On page 8, line 20, the authors' statement regarding the detection of Periostin+ cells outside the periosteum layer could be misinterpreted due to the use of the periostin antibody. Given that periostin is an extracellular matrix protein, the staining may not accurately represent Periostin-expressing cells but rather the presence of periostin in the extracellular matrix. The authors should revise this section for greater precision.

      We acknowledge and appreciate the reviewer's attention to detail. This is, in fact, an error. Nestin-GFP positive periosteal SSC are seen within the periosteum marked by an anti-periostin antibody labeling the extracellular matrix of the periosteum. The manuscript has been revised to address this inaccuracy on page 9, lines 8-9.

      Reviewer #2 (Public review):

      Summary:

      The authors have established a femur graft model that allows the study of hematopoietic regeneration following transplantation. They have extensively characterized this model, demonstrating the loss of hematopoietic cells from the donor femur following transplantation, with recovery of hematopoiesis from recipient cells. They also show evidence that BM MSCs present in the graft following transplantation are graft-derived. They have utilized this model to show that following transplantation, periosteal cells respond by first expanding, then giving rise to more periosteal SSCs, and then migrating into the marrow to give rise to BM MSCs.

      Strengths:

      These studies are notable in several ways:

      (1) Establishment of a novel femur graft model for the study of hematopoiesis;

      (2) Use of lineage tracing and surgery models to demonstrate that periosteal cells can give rise to BM MSCs.

      We thank the reviewer for noting the novelty of our manuscript.

      Weaknesses:

      There are a few weaknesses. First, the authors do not definitively demonstrate the requirement of periosteal SSC movement into the BM cavity for hematopoietic recovery. Hematopoiesis recovers significantly before 5 months, even before significant P-SSC movement has been shown, and hematopoiesis recovers significantly even when periosteum has been stripped.

      This is an important point. Notably, we can see expansion of P-SSCs by day 8 after femur transplantation and evidence of periosteum-derived SSCs in the bone marrow by day 15, before we can detect any significant hematopoietic recovery (see Figure 3A-C).

      Second, it is not clear how the periosteum is changing in the grafts. Which cells are expanding is unclear, and it is not clear if these cells have already adopted a more MSC-like phenotype prior to entering the marrow space.

      This is an interesting question. To examine early changes in gene expression in periosteal SSCs in grafted femurs, we performed additional RNA sequencing on host periosteal SSCs vs periosteal SSCs from grafted femurs at an earlier time point - at 3 days after femur transplantation and on host bone marrow MSCs (see new Supplementary Figure S5 A-C). At this time point the three cell populations are already distinct on the PCA plot (Figure S5A), and there is downregulation of some periosteal genes in the graft P-SSCs (Figure S5B). However, we do not yet see upregulation of Kitl or Cxcl12 or most other BM MSC genes in graft P-SSCs at this time point (Figure S5B). Furthermore, gene set enrichment analysis (GSEA) revealed upregulation of cell cycle, DNA replication and mismatch repair gene signatures, and downregulation of multiple gene signatures compared to host P-SSCs (Figure S5C). Therefore, we conclude that P-SSCs already adopt some gene expression changes early after femur transplantation, but have not yet fully differentiated into BM MSCs at this early time point. This experiment is now discussed on p.10 of the revised manuscript.

      Indeed, given the presence of host-derived endothelial cells in the BM, these studies are reminiscent of prior studies from this group and others that re-endothelialization of the marrow may be much more important for determining hematopoietic regeneration, rather than the P-SSC migration.

      Indeed, as previously shown by our group and others, we agree that endothelial regeneration and re-endothelialization may also play an important role in this bone marrow regeneration model. It is noteworthy that this model has the potential to serve as a valuable tool for analyzing the origin of BM endothelial cells during regeneration processes. To further illustrate the endothelial regeneration, additional images of bone sections from VE-cadherin-cre;TdTomato grafted femurs at 15 days, one month, and five months post-transplantation have been included in the new Figure S3. These images reveal extensive vascularization of the graft and proximity of UBC-GFP+ donor-derived vessels to VE-cadherin+ host-derived blood vessels in the bone marrow within one month (see Figure S2C). This observation is consistent with the timing of both BM MSC recovery and HSC recovery in the grafts, thereby suggesting the importance of endothelial recovery (see Fig. 1B). A new discussion of these findings has been included on page 6 of the revised manuscript and on page 16 in the discussion section.

      Third, the studies exploring the preferential depletion of BM MSCs vs P-SSCs are difficult to interpret. The single metabolic stress condition chosen was not well-justified, and the use of purified cell populations to study response to stress ex vivo may have introduced artifacts into the system.

      We chose to focus on hypoxia as the main condition in which to analyze the stress response of P-SSCs vs BM MSCs because we reasoned that due to the location of P-SSCs on the outside of the bone, these cells would be exposed to a higher oxygen tension than BM-MSCs, which are located within the bone marrow. Therefore, we wanted to determine whether this exposure to a different oxygen tension would be sufficient to explain the different properties of P-SSCs and BM MSCs. We modified the text on p.11 of the manuscript to explain the rationale for this experiment better.

      Reviewer #3 (Public review):

      Summary:

      Marchand, Akinnola, et al. describe the use of the novel model to study BM regeneration. Here, they harvest intact femurs and subcutaneously graft them into recipient mice. Similar to standard BM regeneration models, there is a rapid decrease in cellularity followed by a gradual recovery over 5 months within the grafts. At 5 months, these grafts have robust HSC activity, similar to HSCs isolated from the host femur. They find that periosteum skeletal stem cells (p-SSCs) are the primary source of BM-MSCs within the grafted femur and that these cells are more resistant to the acute stress of grafting the femur.

      Strengths:

      This is an interesting manuscript that describes a novel model to study BM regeneration. The model has tremendous promise.

      We thank the reviewer for highlighting the novelty and potential of our work.

      Weaknesses:

      The authors claim that grafting intact femurs subcutaneously is a model of BM regeneration and can be used as a replacement for gold standard BM regeneration assays such as sublethal chemo/irradiation. However, there isn't enough explanation as to how this model is equivalent or superior to the traditional models. For instance, the authors claim that this model allows for the study of "BM regeneration in vivo in response to acute injury using genetic tools." This can and has been done numerous times with established, physiologically relevant BM regeneration models. The onus is on the authors to discuss or perform the necessary experiments to justify the use of this model. For example, standard BM regeneration models involve systemic damage that is akin to therapies that require BM regeneration. How is studying the current model that provides only an acute injury more relevant and useful than other models? As it stands, it seems as if the authors could have done all the experiments demonstrating the importance of these p-SSCs in the traditional myelosuppressive BM regeneration models to be more physiologically relevant. Along these lines, the use of a standard BM regeneration model (e.g., sublethal chemo/irradiation) as a critical control is missing and should be included. Even if the control doesn't demonstrate that p-SSCs can contribute to the BM-MSC during regeneration, it will still be important because it could be the justification for using the described model to specifically study p-SSCs' regulation of BM regeneration.

      We appreciate the reviewer raising this important point. We never intended this femur transplantation model of bone marrow injury to replace more established models, such as chemotherapy or irradiation. In fact, we compared the effects of femur transplantation to localized bone irradiation on P-SSCs using our Periostin-Cre;Td-Tomato lineage tracing model. We found that irradiation does not induce the same migration of Tomato+ P-SSCs from the periosteum to the bone marrow cavity the way that femur transplantation, and cannot be used to demonstrate the plasticity of P-SSCs in the same way (see new Supplementary Figure S7D-E). Therefore, this appears to be a more severe form of bone marrow injury, and is not similar to other more established assays of bone marrow injury. We also added this discussion to the revised manuscript on p.14 and in the discussion section on p.17.

      The authors perform some analysis that suggests that grafting a whole femur mimics BM regeneration, but there are many experiments missing from the manuscript that will be necessary to support the use of this model. To demonstrate that this new model mimics current BM regeneration models, the authors need to perform a careful examination of the early kinetics of hematopoietic recovery post-transplant. Complete blood counts should be performed on the grafts, focusing on white blood cells (particularly neutrophils), red blood cells, platelets, all critical indicators of BM regeneration. This analysis should be done at early time points that include weekly analysis for a minimum of 28 days following the graft. Additionally, understanding how and when the vasculature recovers is critical. This is particularly important because it is well-established that if there is a delay in vascular recovery, there is a delay in hematopoietic recovery. As mentioned above, a standard BM regeneration model should be used as a control.

      We concur with the reviewer that hematopoietic recovery is a pivotal aspect of this model. We conducted a time-course analysis of bone marrow and HSC cellularity from day 0 to month 5 post-transplantation (Figure 1B). Furthermore, we evaluated the HSC capacities through bone marrow transplantation from grafted or host femurs (Figures 1D and 1E) and quantified the various hematopoietic cells in the graft after five months (Supplemental Figure 1). Furthermore, hematopoiesis occurring in the transplanted bone was comprehensively evaluated in another article, currently in revision and available in BioRxiv (Takeishi, S., Marchand, T., Koba, W. R., Borger, D. K., Xu, C., Guha, C., Bergman, A., Frenette, P. S., Gritsman, K., & Steidl, U. (2023). Haematopoietic stem cell numbers are not solely determined by niche availability. bioRxiv: the preprint server for biology, 2023.10.28.564559. https://doi.org/10.1101/2023.10.28.564559). We did not use another assay of bone marrow regeneration as a “control”, since we do not expect to see similar plasticity of periosteal SSCs in these models, such as with the localized irradiation model described in the new Figure S7D-E.

      We agree with the reviewer that endothelial recovery is also likely to be very important for hematopoietic recovery in this model, but this was not the focus of this manuscript. The process of endothelial recovery  is likely to be more complex than that of MSC recovery, as our findings indicate that the graft endothelium can arise from both the host and the graft femur (see Fig.2D). Consequently, further investigation into the mechanisms of endothelial recovery and its contribution to hematopoiesis in this experimental system will be an interesting focus of future work. We believe that this bone transplantation model represents a valuable tool for addressing questions regarding the origin and regeneration mechanisms of bone marrow endothelial cells.

      The contribution of donor and host cells to the BM regeneration of the graft is interesting. Particularly, the chimerism of the vasculature. One can assume that for the graft to undergo BM regeneration, there needs to be the delivery of nutrients into the graft via the vasculature. The chimerism of the vascular network suggests that host endothelial cells anastomose with the graft. Host mice should have their vascular system labeled with a dye such as dextran to determine if anastomosis has occurred. If not, the authors need to explain how this graft survives up to 5 months. If anastomosis does occur, then it is very surprising that the hematopoietic system of the graft is not a chimera because this would essentially be a parabiosis model. This needs to be explained.

      We have included additional images of bone sections from VE-cadherin-cre;tdTomato grafted femurs at 15 days, one month, and five months post transplantation in the new Figure S3. These images show extensive vascularization of the graft and proximity of UBC-GFP+ donor-derived vessels to VE-cadherin+ host-derived blood vessels in the bone marrow within one month, suggesting a potential anastomosis (Figure S2C). However, it is not surprising that hematopoiesis arises exclusively from the host, as we observed complete death of the hematopoietic cells and BM MSCs in the graft femur within the first 3 days of femur transplantation (see Figure S1A), and we do not see any significant hematopoietic recovery in the grafts until at least 2 months (see Fig.1B). Therefore, this is not similar to a parabiosis model, as confirmed by our chimerism studies shown in Figure 2D. In addition, these data are consistent with the results reported with the use of ossicles (doi:10.1038/nature09262; DOI 10.1016/j.cell.2007.08.025; doi:10.1038/nature07547).

      Most of the data presented for the resistance of p-SSCs to stress suggests DNA damage response. Do p-SSCs demonstrate a higher ability to resolve DNA damage? Do they accumulate less DNA damage? Staining for DNA damage foci or performing comet assays could be done to further define the mechanism of stress resistance properties of p-SSCs.

      This is an interesting question. In our RNA sequencing analysis of graft P-SSCs compared with host P-SSCs we did observe an upregulation of mismatch repair gene signatures by gene set enrichment analysis (GSEA) (new Figure S5C). Therefore, it is possible that P-SSCs do have an altered DNA damage response. However, we are unable to investigate this further at this time.

      Given the importance of BM-MSCs in hematopoiesis and that the majority of the emerging BM-MSCs appear to be derived from p-SSCs, the authors should perform experiments to determine if p-SSC-derived BM-MSCs are critical regulators of BM regeneration. For example, the authors could test this by crossing the Postn-creER mice with iDTR mice to ablate these cells and see if recovery is inhibited or delayed. This should be done with the described periosteum-wrapped femur graft model as well as a control BM regeneration model. Demonstrating that the deletion of these cells affects BM regeneration in both models would further justify the physiological relevance and utility of the femur graft model.

      We thank the reviewer for this excellent suggestion, and we agree that this is an important experiment. However, our attempts to ablate Postn+ cells using the iDTA system were limited by technical difficulties, which we are unable to address at this time.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 2C, the vascular network staining appears to be duplicated, suggesting a possible error in image capture. The authors should replace this image with a different field or an alternative picture to avoid confusion.

      We thank the reviewer for noting this accidental duplication due to an image stitching problem. Figure 2C was replaced by a different image from the same experiment.

      (2) For consistency and clarity, a scale bar should be included in Figure S3E to indicate that the magnification factors of the respective visual fields are identical.

      We thank the reviewer for highlighting this point. The magnification used has been added in the revised Figure.

      (3) In Figure S5B, the difference in normalized Opn mRNA expression relative to Gapdh between steady-state BM-MSCs and P-SSCs seems substantial, which contradicts the "ns" (not significant) label. The authors should verify the accuracy of this labeling.

      We agree with the reviewer that this difference in what is now Figure S6B looks substantial. However, we confirmed that this difference is not statistically significant, likely due to the high variability between replicates in Opn expression in the steady state BM MSCs.

      Reviewer #2 (Recommendations for the authors):

      In order to strengthen the argument that P-SSCs are necessary for hematopoietic recovery, the authors should consider providing the following data:

      (1) In the periosteal stripping experiments, the authors should show if periosteum-derived MSCs are present in the BM throughout the process of hematopoietic recovery (not just at the end of the experiment). If none are present at the end, that would mean that periosteum is not required for hematopoietic recovery, but would still suggest that it is required for optimal hematopoietic recovery. At early time points, it would also be very helpful to demonstrate the composition and amount of endothelium present in the marrow to determine if P-SSC migration and differentiation into MSCs depends on endothelial reconstitution.

      To further examine the vascularization of the transplanted femur at an earlier time point, we have added additional images of grafted femur from VE-cadherin-cre;tdTomato at 15 days and one month post transplantation in the new Figure S3A and S3B. These images already show extensive vascularization of the graft periosteum stained with an anti-periostin antibody. In addition, we observed anastomoses of host VE-cadherin;Tmt+ blood vessels with graft ubc-GFP+ blood vessels in the grafted periosteum within one month (Figure S3C).

      (2) Studies of the surgical periosteum grafts could benefit from histologic analysis of the BM and its MSC components at earlier time points following grafting since the data provided are only at 5 months. Such studies would allow a better appreciation of the relationship between P-SSC migration into the marrow and hematopoietic recovery.

      We have performed histologic analysis of grafted femurs at multiple early time points, which shows expansion of P-SSCs and their migration into the bone marrow cavity (Figure 3C).

      (3) Studies of stress responses preferably should be performed using intact bone and should characterize P-SSC and BM MSC apoptosis, cell cycle status, differentiation, etc, immediately following shifts to the stress conditions. These studies would be more compelling if performed using additional "stress" conditions likely to represent the graft environment.

      This is an interesting suggestion. However, these types of studies would not be possible in intact bones ex vivo, as P-SSCs are known to migrate out of the bone in culture.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Major comments:

      (1) In Figure 1 the authors could reference and use NSP8 (PMID: 38275298) and Nucleocapsid (PMID: 37185839) in their experiments as positive controls.

      Thank you for your suggestion! In Figure 1A, during our screening of SARS-CoV-2 nsp proteins regulated by MG132, we confirmed that nsp8 can also be restored by MG132. This finding indicates that nsp8 is degraded via the proteasome pathway and can therefore serve as a positive control for the experiment. It has been reported that nsp8 undergoes degradation via the ubiquitin-proteasome pathway following its ubiquitination mediated by TRIM22. We have added the description at line 115 in the manuscript.

      (2) The data indicating that NSP16 is ubiquitinated come from overexpression systems, and it is possible that NSP16 ubiquitination only occurs in expression contexts, not during coronavirus infection. If NSP16 ubiquitination can't be measured in the context of infection, it is unclear how we can make any conclusions. The authors need to demonstrate the ubiquitination of NSP16 in the context of viral infection.

      We greatly appreciate the reviewer's suggestion and have incorporated the corresponding experimental results. As shown in Figure 5A, co-IP experiments using an endogenous nsp16 antibody were conducted following infection with the SARS-CoV-2 Wuhan strain. These experiments confirmed that the nsp16 protein encoded by the virus undergoes ubiquitination in infected cells. This finding highlights the ubiquitination of nsp16 within a biological context, thereby supporting our conclusions in expression contexts.

      (3) In Figure 4, adding controls will strengthen the authors' conclusion.

      a) Is it possible to observe ubiquitination of NSP16 by transfecting in NSP16-FLAG tagged, immunoprecipitate NSP16, run a western blot, and probe for endogenous ubiquitin?

      b) Can the authors please include an empty vector control as well as WT ubiquitin in these panels for comparison?

      c) In addition, why are the Ubiquitination patterns different in the IP panels of D and E vs B?? Without an empty vector control, it is challenging to conclude what the background is.

      Thank you for your valuable suggestions! We have made the following changes and additions in response to your comments:

      a) We have conducted the experiments as per the reviewer's suggestion. Figure 3B shows the result. Co-IP experiments were performed, and endogenous ubiquitination of nsp16 was observed using the endogenous ubiquitin antibody.

      b) We apologize for previously focusing solely on presenting multiple ubiquitin mutants on a single panel of nsp16 IP without considering the inclusion of an empty vector control and WT ubiquitin. The experiment has been redesigned and conducted, and the results are now presented in Figures 3E and 3F.

      c) The differences in the ubiquitination patterns observed between the IP panels in Figures 3E and 3F compared to 3C may be due to varying plasmids, differences in antibody and depth of exposure. To address this, we have standardized the plasmids in the figure and included an empty vector control as a negative control to clarify the background signal.

      (4) Overexpression of the ubiquitin mutants may have an indirect effect on protein homeostasis. The authors can also utilize linkage-specific antibodies in their studies to elucidate the ubiquitin linkage associated with NSP16 ubiquitination. K63-linkage Specific Polyubiquitin (D7A11) Rabbit mAb, 5621S, and K48-linkage Specific Polyubiquitin (D9D5) Rabbit mAb, 8081S from Cell Signaling Technologies?

      We greatly appreciate the reviewer's excellent suggestion! Using linkage-specific antibodies to elucidate the ubiquitin linkage associated with nsp16 ubiquitination would indeed provide more direct evidence. However, due to the long lead time for obtaining these antibodies, we plan to conduct further verification in future experiments.

      (5) The authors discussed the subcellular localization of overexpressed NSP16- showing the localization of NSP16 in the context of viral infection would strengthen the study. If this is challenging, can the authors express NSP16 along with the co-factor NSP10 and examine its subcellular localization?

      Thank you for your suggestion! During viral infection, we observed the ubiquitination of the nsp16 protein through co-IP experiments, indicating that the presence of nsp10 does not influence the regulation of nsp16 ubiquitination by MARCHF7 or UBR5 (Figure 5A). Therefore, we believe that investigating the co-localization of nsp10 and nsp16 would not provide additional value to our results. Additionally, through a literature review, we found studies that have already examined the localization of nsp10 and nsp16 following viral infection. These studies revealed that nsp10 was located in the cytoplasm, while nsp16 can be detected in both the nucleus and cytoplasm (PMID: 33080218; PMID: 34452352). This observation is consistent with the localization of nsp16 that we observed in our overexpression experiments.

      (6) a) In Figure 3A, the authors should note that the interaction of NPS16 appears weak with UBR5. The authors should confirm that the interaction of NSP16 and the E3 ligases is relevant in the context of viral infection.

      b) In Figure 3B, the scale bars should be labeled in at least one panel, as well as in the legend.

      c) The authors discussed nuclear localization of MARCHF7, UBR5, and NSP16, therefore a control with a nuclear stain should be included in this figure to enhance the study.

      d) Some panels look overexposed while others are blurry which decreases the robustness of the interaction as the authors stated in line 191. To strengthen the results of Figure 3, consider GST purification and in vitro, cell-free binding assays to confirm a direct interaction between nsp16 and the E3 ligases

      Thank you for the reviewer’s thoughtful suggestions! We have made the following changes and adjustments based on your recommendations:

      a) On the interaction between nsp16 and UBR5:

      The interaction between nsp16 and UBR5 appears to be weak, possibly due to the large size of the UBR5 protein (300 kDa). As a result, there are challenges in presenting the experimental results, including difficulties in both expression and protein level detection. To further confirm the relevance of the interaction between nsp16 and the E3 ligases in the context of viral infection, we have performed experiments, and the results are presented in Figure 5A.

      b) On scale bars:

      The issue regarding the scale bars in Figure 4 has been addressed, and we have now included them in the figure legend for clarity (Line 885).

      c) On nuclear localization control:

      For the localization of MARCHF7, UBR5, and nsp16 in Figure 4C, given that both MARCHF7 and UBR5 are tagged with CFP, DAPI staining would result in spectral overlap. However, we conducted co-localization experiments for MARCHF7 or UBR5 with nsp16 in Figure 4—figure supplements 1E and 1F, where DAPI staining was included to illustrate the localization of these three proteins. Our experiments showed that while these proteins are present in both the nucleus and cytoplasm, they are predominantly localized in the cytoplasm.

      d) On validation of direct interaction:

      We attempted GST purification and in vitro cell-free binding assays to verify the direct interaction between nsp16 and the E3 ligases. However, UBR5 and MARCHF7 are both large proteins, with UBR5 being particularly large, which significantly increased the difficulty of purification. Additionally, we faced challenges in purifying nsp16, as the purified nsp16 protein tended to aggregate. We will continue to optimize purification techniques and conditions in future experiments.

      We appreciate your valuable comments, which have greatly contributed to improving our experiments and conclusions.

      .

      (7) To confirm the knockdown of the E3 ligases by siRNA, the authors should use western blotting to show the presence/absence/decrease of the protein levels in addition to mRNA levels by RT-PCR. The authors have the lysates, and they have shown that the antibodies for MARCHF7 and UBR5 work therefore including this throughout the manuscript to help substantiate the authors' conclusions.

      Thank you for the reviewer’s valuable suggestion! We have validated the knockdown efficiency at the protein level for the experiments involving siRNA knockdown. Corresponding Western blot images are now included in the relevant experiments to substantiate our conclusions, in addition to the RT-PCR data, including Figures 2, 4 and 5.

      (8) In the overexpression studies of the E3 ligases with viral infection in Figure 5, the authors should include the catalytic mutants for the E3 ligases with the nsp16 gradient experiment. This would strengthen the conclusion of the studies.

      Thank you for the reviewer’s suggestion! We have conducted the relevant experiments based on your recommendation, and the corresponding data are presented in the Figure 6—figure supplements 2A-H. These results strengthen the conclusions of our study.

      (9) Figure 5: For C and F, for a better comparison of the efficacy against the 2 strains, the authors should use the same scale. This could benefit from a kinetics experiment.

      Thank you for the reviewer’s suggestion! We have made revisions in Figures 5E and 5H in responses to your recommendation.

      (10) Is there a synergistic effect of double E3 knockdown on viral replication?

      Thank you for the reviewer’s question! In Figures 5—figure supplement 1A-B, we conducted experiments by individually and simultaneously knocking down MARCHF7 or UBR5, followed by infection with viral SARS-CoV-2 transmissible virus-like particles. The results revealed that simultaneous knockdown further enhances viral replication, demonstrating a synergistic effect.

      (11) In lines 98-100 the authors state "This dual targeting by MARCHF7 and UBR5 impairs the 2'-O-MTase activity of nsp16, blocking the conversion of cap-0 to cap-1 at the 5 'end of viral RNA, ultimately exhibiting potent antiviral activity against SARS-CoV-2". The authors did not examine the 2'-O-MTase activity of nsp16. The authors should rephrase this or provide the data if this experiment was done.

      Thank you for the reviewer’s valuable suggestion! Based on your comment, we have revised the ambiguous wording located in lines 100-104.

      (12) In the discussion, the authors reported that elucidating a specific lysine residue (s) that is ubiquitinated was challenging and stated that they generated multiple mutants including truncated mutants, and wrote "data not shown". The authors need to include this data as supplementary.

      Thank you for the reviewer’s suggestion! Based on your comment, we have included the data regarding the specific lysine residue(s) that is ubiquitinated, along with the truncated mutants, as supplementary data (Appendix-figure S2).

      (13) In Figure 7, the authors showed a copy number of SARS CoV-2 E in lung tissue. The authors should show viral titers using either the plaque assay or the TCID50 assay.

      Thank you for the reviewer’s suggestion! Based on your comment, we measured the TCID50 of the virus in the lung tissue homogenates, and the results are presented in Figure 7D.

      Minor comments:

      (1) Line 76: while many E3 ubiquitin ligases directly recognize and bind to their target substrates, cullin-RING ligases directly bind an adaptor, which binds a substrate receptor and/or the substrate directly, while the RING-box protein binds a different surface of the cullin and is also not directly interacting with substrate.

      Thank you for the reviewer’s valuable suggestion! Based on your comment, we have revised the ambiguous wording in line 76.

      (2) Line 161: having introduced the suggestion that NSP16 is ubiquitinated by these ligases, consider moving Figure 4 to the Figure 3 spot.

      Based on your comment, we have rearranged the order of the figures and moved Figure 4 to the Figure 3 spot.

      (3) Figure 2: Can the authors please do +/- MG132 for each siRNA? It is possible that the lanes where we don't see NSP16 were because there was no NSP16 expressed, OR it was degraded, MG132 would confirm one or the other.

      Thank you for the reviewer’s suggestion! Based on your comment, we have redesigned the experiment and included the MG132 treatment for each siRNA. The results are presented in Figure 2A.

      (4) Line 165: The authors write "As confirmed by MS, both Myc-tagged MARCHF7 and endogenous UBR5 interact with nsp16, as seen in the Co-IP experiment" should be the reverse, MS suggests NSP16-E3 interaction, the co-ip confirms this.

      Based on your comment, we have revised the wording in line 183 to ensure accuracy. MS suggests the interaction between nsp16 and the E3 ligases, while the Co-IP experiment confirms this interaction.

      (5) Line 178: the cited paper doesn't clearly show NSP16 nuclear localization, nor do the authors of said paper claim that they found it there. It is cytoplasmic. Additionally, said paper used overexpression, and it is unclear if NSP16 is nuclear in the context of viral infection.

      Thank you for the reviewer’s suggestion! The referenced paper states, "As can be seen in the Supplementary Fig. S2, the viral proteins are either cytoplasmic (NSP2, NSP3C, NSP4, NSP8, Spike, M, N, ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8, ORF9b, and ORF10) or both nuclear and cytoplasmic (NSP1, NSP3N, NSP5, NSP6, NSP7, NSP9, NSP10, NSP12, NSP13, NSP14, NSP15, NSP16, E, and ORF9a)," indicating that nsp16 is localized in both the nucleus and cytoplasm. Upon reviewing the literature, we found that the paper (PMID: 33080218) reports the distribution of nsp16 protein following viral infection. The results indicate that nsp16 is present in both the nucleus and cytoplasm, although the authors of the referenced paper claim that ns16 was located in the nucleus.

      (6) Line 197: in addition to the 7 lysine residues, ubiquitin can also form linear N-terminal linkages.

      Thank you for the reviewer’s suggestion! Linear N-terminal ubiquitination, with its distinct linkage and substrate recognition mechanism, is typically mediated by a complex consisting of the E3 ubiquitin ligases HOIL-1 and HOIP, and differs from classical ubiquitination. Therefore, this type of ubiquitin chain was not investigated in our experiments.

      (7) Line 202: Authors state "Interestingly, all single-lysine Ub mutants promoted nsp16 ubiquitylation to varying degrees, indicating a complex polyubiquitin chain structure on nsp16 potentially regulated by multiple E3 ligases". However, not all the mutants. K33 isn't supported by the blot.

      Thank you for pointing that out! Indeed, we made an error in our description. The K33 mutant did not promote nsp16 ubiquitylation, and we have corrected this in the manuscript accordingly in line 173.

      (8) Line 204: consider including "E2-E3 ligase pairs" for RING ligases the E2 determines the linkage type see: Cell Research (2016) 26:423-440.

      Thank you for your suggestion! We have included the term "E2-E3 ligase pairs" in the article in line 176.

      (9) Line 235: The authors used the real virus, the inclusion of the BLS2 virus here is extraneous, it doesn't add anything. The authors can consider removing it.

      Thank you for your suggestion! In our experiments, we performed simultaneous knockdown of two E3 ligases, so we believe this data is relevant and should not be removed.

      (10) Line 238: Authors state: "led to a significant increase in SARS-CoV-2 levels compared to the control group". What is meant by "levels?"

      Thank you for your careful reading. We have updated "levels" to "replication" as suggested to clarify the meaning in line 237.

      (11) Line 245: increased titers. This could be improved for specificity by saying, 1-log increase for example.

      Thank you for the reviewer's valuable suggestions. We have made the necessary changes and specified "increased titers" as a "1-log increase" in lines 249 and 261.

      (12) Line 249: in Figure 5H again, the authors are showing relative mRNA levels. Ideally should show protein levels by western blot.

      Thank you for the reviewer's suggestion! We have performed protein-level detection of the knockdown efficiency for the samples, and the bands have been placed in the corresponding positions in Figure 5I.

      (13) Line 259: "strongly linked to their ability to modulate..." This appears to be an overextension of the data. The data show nsp16 levels can compensate for E3 overexpression, but not that the E3 ligases are modulating this activity. We can infer this from previous experiments. Perhaps increasing the NSP12 levels would also have the same effect as they don't show that this is specific to NSP16. What about a catalytically dead E3?

      Thank you for the reviewer's thoughtful suggestion. We have revised the wording accordingly and designed the viral-related experiments with E3 enzyme activity mutants in Figure 6 supplement 2.

      (14) Figure 6: In panel H the MW for UBR5 is incorrect, should be around 300kDa.

      Thank you for the reviewer's detailed suggestions. We have made the necessary revisions in Figure 6H.

      (15) Line 267: "suggesting a more conserved sequence". What are the authors referring to? More conserved than what? This section would benefit from a discussion of which residues are mutated. Are they potential Ub sites, which could point to differential degradation by the E3s as due to more ubiquitination? Or rather to more efficient interaction with the E3? Is this conserved in related CoVs: original SARS and MERS, for instance?

      Thank you for the reviewer’s detailed suggestions. In this context, by “conservation,” we refer to the relative conservation of nsp16 proteins across different subtypes of the Omicron variant. We found that most of the mutation sites contained only 1 to 2 mutations. Additionally, we have constructed and validated multiple-mutant nsp16 proteins, which are still degraded by MARCHF7 or UBR5. Given the ongoing prevalence of the Omicron variant, we aim to explore the broad-spectrum degradation and antiviral effects of these two E3 ligases. While it would be ideal if these experiments could aid in identifying the ubiquitination sites, we have not yet identified any mutant forms that escape degradation. We also compared the nsp16 proteins of several other coronaviruses (such as human coronaviruses 229E, HKU1, MERS-CoV, NL63, OC43, and SARS-CoV-1), and found that these viruses' nsp16 proteins are not highly conserved. As a result, we have not further investigated whether MARCHF7 or UBR5 regulate the nsp16 proteins of these viruses.

      (16) Line 347: 2C of what virus?

      Thank you for the reviewer’s careful reading. We have made the necessary additions to address this point in line 357.

      (17) Line 890: "Scale bars, 25 mm". Should it be 25nm?

      Thank you for your feedback! I realized there was an error in the unit labeling, and I have corrected the relevant sections in line 904. I appreciate your careful reading.

      Reviewer #2 (Recommendations for the authors):

      (1) In Figure 6, the authors found that increasing amounts of nsp16 restored the replication of SARS-CoV-2 in the presence of MARCHF7 or UBR5. The authors better discuss the possibility that nsp16 may stimulate viral replication regardless of these E3 ligases, or provide evidence to further clarify this.

      Thank you for your thoughtful suggestion! Given the strong functionality of nsp16 itself, your consideration is very comprehensive. In Figure 6—figure supplement 2A–H, we conducted transfection experiments with E3 activity-deficient proteins and reintroduced nsp16. The results showed that, in the absence of active MARCHF7 or UBR5 antiviral function, overexpression of nsp16 did not promote viral replication, although the RNA levels of the M protein slightly increased. Therefore, in our experiments, excess nsp16 did not significantly stimulate viral replication.

      (2) In Figure 7, the in vivo data supports the function of both E3 ligases to reduce viral infectivity. Is it possible that tail vein injection of naked plasmid DNA may stimulate the innate immune system, e.g., induce IFN as a DNA vaccine, which may contribute to the inhibitory effect? The authors are suggested to discuss or address it.

      Upon reviewing the relevant literature, we found that the hydrodynamic gene delivery (HGD) method using naked DNA is both highly efficient and associated with a low risk of triggering immune responses or oncogenesis. Studies have shown that HGD only weakly activates host immunity (reference: 37111597), which is less of a concern compared to other gene delivery methods. Although some studies have reported strong immune responses following the injection of naked DNA (e.g., Otc cDNA) in human trials, it is noteworthy that no such responses were observed in 17 other participants. This suggests that the immune reactions observed in some cases may be due to individual variability or limitations in animal models, which may not fully translate to human trials.

      Based on these findings, we believe that the antiviral effects observed in our study are primarily attributable to the intrinsic properties and functions of the E3 ligases.  Furthermore, it has been reported that mice and non-human primates exhibit significantly greater resistance to innate immune activation compared to humans. This highlights the challenges in translating these findings into effective antiviral therapeutics and underscores the need for further research in this area. We have incorporated the requested discussion into the manuscript in lines 393-410.

      (3) The authors shall include some of the key data in supplementary figures in the main text, such as the study on UBR5 and MARCHF7 mediate broad-spectrum degradation of nsp16 variants and SARS-CoV-2 infection decreases UBR5 and MARCHF7 expression, which make it easier for readers to follow.

      Thank you for your valuable suggestion regarding the organization of our manuscript. In response to your feedback, we have moved the study on nsp16 variants to the Figure 6—figure supplement 3. Additionally, the data showing changes in UBR5 and MARCHF7 levels following viral infection have been added as supplementary data in Figure 6—figure supplement 4.

      (4) The diagrammatic sketches in Figures 1E, S1A and B, 7A, and 8 had low resolutions. Please change them to higher resolutions. Moreover, please state the licensing rights of these diagrammatic sketches.

      Thank you for your detailed review! In response to your comment, we have improved the resolution of Figures 1E, S1A and B, 7A, and 8. Additionally, we have specified the drawing tools and source websites in the figure legends (lines 794, 813, 999, and 1013). And we have obtained the necessary licenses for each diagram.

      Figure 1E: Created in BioRender. Li, Z. (2025) https://BioRender.com/h43f612

      Figure S1B: Created in BioRender. Li, Z. (2025) https://BioRender.com/b98t559

      Figure 7A: Created in BioRender. Li, Z. (2025) https://BioRender.com/e76g512

      Figure 8: Created in BioRender. Li, Z. (2025) https://BioRender.com/o84p897

      (5) The authors suggested that both UBR5 and MARCHF7 had a function in triggering the degradation of NSP16, however, the expression of UBR5 but not MARCHF7 was shown to be associated with the severity of clinical symptoms. Further, why did the host evolve 2 kinds of E3 ligases to adjust only 1 viral target? Please discuss them.

      Thank you for your insightful comments. We acknowledge that the limited number of patients with varying degrees of illness in our study could potentially mask some of the observed phenomena. Additionally, individual variability may also play a significant role, which highlights the challenges in translating findings from animal models to human trials.

      Regarding the presence of two E3 ligases targeting the same substrate, we view this as part of an evolutionary arms race between the host and the virus. Viruses evolve mechanisms to counteract the host’s antiviral responses, while the host, in turn, develops multiple pathways and strategies to combat viral infection. This dynamic may explain why multiple E3 ligases regulate the levels of the same factor, reflecting the host’s complex and redundant antiviral defense mechanisms. We have incorporated the requested discussion into the manuscript in lines 359-362.

      (6) Please standardize the symbol size of the bar charts in the same figure, just like in Figures 1D and 5.

      Thank you for your constructive suggestion. We have standardized the symbol sizes of the bar charts in the figure as per your recommendation, ensuring consistency across all panels.

      (7) The use of English could be improved.

      Thank you for your feedback regarding the language. We have carefully reviewed the manuscript and made revisions to improve the clarity and fluency of the English.

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) In Figure 1: The expression level of NSP6, 10, 11, and 12 is weak. Include a higher exposure blot (right next to these blots marking as higher exposure) to show the expression of these plasmids. Here, the NSP12 plasmid has no expression, so it is difficult to conclude the effect of MG132 from this blot. It will be appropriate to show the molecular weight of each gene fragment since some of the plasmids have multiple bands. Verify the densitometric analysis, the NSP4 (+/- MG132) blot, and the densitometric analysis do not correlate. Figure 1B: It is recommended to include appropriate control (media only) for NH4Cl. The DMSO control serves well for the drugs, not for Ammonium Chloride. In Figure 1C, how did the authors arrive at the 15-hour time point? The correlation does not appear as the authors claim. Where is the 15-hour sampling time point for MG132 or CHX chase? The experimental approach to screen the E2/E3 Ub ligase is appreciated.

      Thank you for your valuable feedback! Regarding your questions, we have made the following revisions:

      On the expression of nsp6, nsp10, nsp11, and nsp12 in Figure 1:

      We have replaced the blots for nsp10, nsp11, and nsp12 with higher exposure blots. However, due to the strong expression of NSP14, we were unable to generate a higher exposure blot for nsp6. Based on the current exposure, it is clear that nsp6 is not regulated by the proteasome. Additionally, in the high-exposure blot for nsp12, we were able to observe its expression and found that this protein is weakly regulated by MG132. Following your suggestion, we have labeled the molecular weights of the proteins in the figure.

      On the densitometric analysis of nsp4 protein:

      We recalculated the densitometric analysis for nsp4 and found no issues. Although the band intensities do not show large changes, the relative fold changes appear more pronounced because we normalized the data using GAPDH as an internal control. We have added detailed description in the figure legend.

      On the NH4Cl control:

      In this experiment, ammonium chloride was dissolved in DMSO. We reviewed the solubility data and found that ammonium chloride has a solubility of 50 mg/ml in DMSO, which is sufficient to reach the concentrations used in our experiment. While the solubility is higher in water, we believe that DMSO is an appropriate solvent for this compound in our context.

      On the 15-hour time point in Figure 1C:

      Regarding the 15-hour time point mentioned in Figure 1C, we did not collect samples at that time. We performed semi-quantitative analysis of protein levels at different time points using ImageJ and estimated the half-life time point based on the half-life calculation formula. Thank you for your suggestion; we will clarify this in the figure legend.

      Once again, thank you for your thoughtful review and constructive suggestions. We have made the necessary revisions and improvements to the figures based on your feedback.

      (2) In Figure 2: I do not find a reason to include DMSO control in the siRNAs for E2/E3 Ub. Please justify why it is necessary. It is requested to include WB for the siRNA-treated samples. It is strongly recommended to show the WB data for siRNA-treated samples because you are showing siRNA treatment of MARCHF7 in shUBR5 cells and vice versa. However, if antibodies for corresponding targets are not available, qPCR can be shown in graphical representation in supplementary data indicating the siRNA target region and qPCR target. Show a graphical representation of domains/ deleted regions of MARCHF7 and UBR5.

      Thank you for your valuable feedback! We have addressed your concerns as follows:

      On the inclusion of the DMSO control group:

      The DMSO group was initially included as a control for the MG132-treated group. By comparing with the MG132 group, we aimed to observe whether nsp16 levels were restored by MG132 treatment. Additionally, in siRNA knockdown experiments, the DMSO group was included to compare nsp16 protein levels after knockdown with those in the NC group, as well as to assess differences in nsp16 restoration between MG132 treatment and factor knockdown. However, we acknowledge some issues in the control design. To address this, we have redesigned and conducted the experiments with improved controls (Figure 2A).

      On validating knockdown efficiency:

      We have included Western blot data for UBR5 and MARCHF7 knockdown efficiencies. For other factors where specific antibodies were unavailable, we followed your suggestion and provided graphical representations in the Appendix-figure S1, illustrating the siRNA target regions and qPCR target sites to confirm knockdown specificity and efficiency.

      (3) In Figure 4 A: Write details on how this IP was done. What was the transfection time of this plasmid? Is the transfection time different from that of NSP16 in Figure 1A which shows a significant degradation of NSP16? Please discuss this in detail. It is recommended that this IP be done in +/- MG132. Since you have used siRNA and performed an IP, It is recommended to repeat the IP (with +/- MG132) using the MARCHF7 and UBR5 plasmids

      Thank you for your detailed review and suggestions! We have addressed your concerns as follows:

      On the specific protocol for the co-IP in Figure 3A:

      The detailed protocol for the immunoprecipitation (IP) experiment is as follows: on day 1, cells were plated, and on day 2, we co-transfected nsp16 and Ub expression plasmids. After 32 hours of transfection, we treated the cells with MG132 for 16 hours, then harvested the cells for IP. We included MG132 treatment in all ubiquitination IP experiments because, without MG132, nsp16 would be degraded, preventing us from observing changes in ubiquitination levels. We apologize for not clearly labeling this in the figure, and we have made the necessary modifications.

      On the use of MG132 and NSP16 degradation:

      Following your suggestion, we have clarified the use of MG132 in the IP experiments, which differs from the degradation of nsp16 shown in Figure 1A. In Figure 1A, we show the degradation of nsp16 in the absence of MG132 treatment.

      On the overexpression of UBR5 and MARCHF7:

      The effect of overexpressing UBR5 or MARCHF7 on ubiquitination has been validated in Figure 4 supplement 2. In these experiments, we explored the effect of UBR5 activity domain inactivation on nsp16 ubiquitination, as well as the effect of MARCHF7 truncation on nsp16 ubiquitination modification. In these experiments, overexpression of the wild-type E3 ligases was also included, and the results yielded the same conclusions as those from the E3 knockdown experiments, thereby validating the robustness of our findings.

      (4) In Figure 4C: Appropriate controls are missing. The authors claim NSP16 is ubiquitinated and degraded by UBR5 and MARCHF7 via K27 and K48 chains. There is no NSP16 Only control. We cannot compare the NSP16 without an NSP16 transfection. I will suggest the authors repeat these individual controls in both the presence and absence of MG132.

      Thank you for your careful review and valuable suggestion! In response to your comment, we have redesigned the experiment and added a control group without nsp16 transfection. We have repeated the validation in the presence of MG132. Without MG132 treatment, nsp16 is degraded, leading to very low protein levels, making it difficult to observe the phenomenon. We have updated the figure accordingly and made the necessary adjustments based on your suggestion (Figure 3E-F).

      (5) In my opinion, the Figure 8 needs modification. It is requested to show the levels of strand-specific viral mRNA under UBR5 and MARCHF7 knock-down in +/- of MG312. This figure should also be supported by WB indicating the level of NSP16 (capping activity) and any of the viral proteins. This may validate that if the capping activity is lost, viral translation is affected and hence there is a reduction in virus titre. Alternatively, the figure can be modified by putting a sub-heading box over 7mGppA-RNA section and marking it as a future direction/ hypothesis.

      Thank you for your thorough and thoughtful review! Regarding the modification of Figure 8, we completely agree with your suggestion. Currently, examining the impact of viral RNA cap modification is technically challenging for us. Therefore, we have followed your advice and marked the investigation of how nsp16 degradation affects viral RNA cap structures as a future direction/hypothesis in the schematic of Figure 8. This revision helps provide direction for future experiments and enhances the clarity of the figure. Thank you for your thoughtful consideration and valuable suggestion!

      Minor points:

      (1) Figure 2A: Align NSP16 Blot to actin.

      Thank you for your constructive feedback! We have redesigned the experiment and included an MG132 treatment group in Figure 2A. Consequently, the figure has been revised comprehensively, and the nsp16 blot has been aligned with tubulin.

      (2) Figure 2C: It is recommended to properly align the lanes where the pLKO and shRNA labelling are overlapping.

      Thank you for your thoughtful suggestion! We have revised Figure 2C based on your recommendation to ensure that the pLKO and shRNA labeling no longer overlap. We sincerely apologize for any confusion this may have caused and appreciate your understanding and support.

      (3) Just a curious question, what happens if we silence both UBR5 and MARCHF7 and check for virus titre? This is an additional work, but if the authors do not agree, it is ok.

      Thank you for your valuable suggestion! Regarding your question about silencing both UBR5 and MARCHF7, we indeed attempted to generate knockout cell lines, but unfortunately, we were not successful at this stage. We plan to explore alternative methods to establish stable knockout cell lines in our future experiments. Meanwhile, as shown in Figure 5 supplement 1, we have performed experiments where both UBR5 and MARCHF7 were knocked down simultaneously, followed by infection with virus-like particles. The results indicate that dual knockdown further enhances viral replication. These findings may partially address your question. Thank you again for your insightful suggestion!

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The authors investigated causal inference in the visual domain through a set of carefully designed experiments, and sound statistical analysis. They suggest the early visual system has a crucial contribution to computations supporting causal inference. 

      Strengths: 

      I believe the authors target an important problem (causal inference) with carefully chosen tools and methods. Their analysis rightly implies the specialization of visual routines for causal inference and the crucial contribution of early visual systems to perform this computation. I believe this is a novel contribution and their data and analysis are in the right direction. 

      Weaknesses: 

      In my humble opinion, a few aspects deserve more attention: 

      (1) Causal inference (or causal detection) in the brain should be quite fundamental and quite important for human cognition/perception. Thus, the underlying computation and neural substrate might not be limited to the visual system (I don't mean the authors did claim that). In fact, to the best of my knowledge, multisensory integration is one of the best-studied perceptual phenomena that has been conceptualized as a causal inference problem.

      Assuming the causal inference in those studies (Shams 2012; Shams and Beierholm 2022;

      Kording et al. 2007; Aller and Noppeney 2018; Cao et al. 2019) (and many more e.g., by Shams and colleagues), and the current study might share some attributes, one expects some findings in those domains are transferable (at least to some degree) here as well. Most importantly, underlying neural correlates that have been suggested based on animal studies and invasive recording that has been already studied, might be relevant here as well.

      Perhaps the most relevant one is the recent work from the Harris group on mice (Coen et al. 2021). I should emphasize, that I don't claim they are necessarily relevant, but they can be relevant given their common roots in the problem of causal inference in the brain. This is a critical topic that the authors may want to discuss in their manuscript. 

      We thank the reviewer. We addressed this point of the public review in our reply to the reviewer’s suggestions (and add it here again for convenience). The literature on the role of occipital, parietal and frontal brain areas in causal inference is also addressed in the response to point 3 of the public review.

      “We used visual adaptation to carve out a bottom-up visual routine for detecting causal interactions in form of launching events. However, we know that more complex behaviors of perceiving causal relations can result from integrating information across space (e.g., in causal capture; Scholl & Nakayama, 2002), across time (postdictive influence; Choi & Scholl, 2006), and across sensory modalities (Sekuler, Sekuler, & Lau, 1997). Bayesian causal inference has been particularly successful as a normative framework to account for multisensory integration (Körding et al., 2007; Shams & Beierholm, 2022). In that framework, the evidence for a common-cause hypothesis is competing with the evidence for an independent-causes hypothesis (Shams & Beierholm, 2022). The task in our experiments could be similarly formulated as two competing hypotheses for the second disc’s movement (i.e., the movement was caused by the first disc vs. the movement occurred autonomously). This framework also emphasizes the distributed nature of the neural implementation for solving such inferences, showing the contributions of parietal and frontal areas in addition to sensory processing (for review see Shams & Beierholm, 2022). Moreover, even visual adaptation to contrast in mouse primary visual cortex is influenced by top-down factors such as behavioral relevance— suggesting a complex implementation of the observed adaptation results (Keller et al. 2017). The present experiments, however, presented purely visual events that do not require an integration across processing domains. Thus, the outcome of our suggested visual routine can provide initial evidence from within the visual system for a causal relation in the environment that may then be integrated with signals from other domains (e.g., auditory signals). Determining exactly how the perception of causality relates to mechanisms of causal inference and the neural implementation thereof is an exciting avenue for future research. Note, however, that perceived causality can be distinguished from judged causality: Even when participants are aware that a third variable (e.g., a color change) is the best predictor of the movement of the second disc in launching events, they still perceive the first disc as causing the movement of the second disc (Schlottmann & Shanks, 1992).”

      (2) If I understood correctly, the authors are arguing pro a mere bottom-up contribution of early sensory areas for causal inference (for instance, when they wrote "the specialization of visual routines for the perception of causality at the level of individual motion directions raises the possibility that this function is located surprisingly early in the visual system *as opposed to a higher-level visual computation*."). Certainly, as the authors suggested, early sensory areas have a crucial contribution, however, it may not be limited to that. Recent studies progressively suggest perception as an active process that also weighs in strongly, the topdown cognitive contributions. For instance, the most simple cases of perception have been conceptualized along this line (Martin, Solms, and Sterzer 2021) and even some visual illusion (Safavi and Dayan 2022), and other extensions (Kay et al. 2023). Thus, I believe it would be helpful to extend the discussion on the top-down and cognitive contributions of causal inference (of course that can also be hinted at, based on recent developments). Even adaptation, which is central in this study can be influenced by top-down factors (Keller et al. 2017). I believe, based on other work of Rolfs and colleagues, this is also aligned with their overall perspective on vision.  

      Indeed, we assessed bottom-up contributions to the perception of a causal relation. We agree with the reviewer that in more complex situations, for instance, in the presence of contextual influences or additional auditory signals, the perception of a causal relation may not be limited to bottom-up vision. While we had acknowledged this in the original manuscript (see excerpts below), we now make it even more explicit:

      “[…] we know that more complex behaviors of perceiving causal relations can result from integrating information across space (e.g., in causal capture; Scholl & Nakayama, 2002), across time (postdictive influence; Choi & Scholl, 2006), and across sensory modalities (Sekuler, Sekuler, & Lau, 1997).”

      “[…] Neurophysiological studies support the view of distributed neural processing underlying sensory causal interactions with the visual system playing a major role.”

      “[…] Interestingly, single cell recordings in area F5 of the primate brain revealed that motor areas are contributing to the perception of causality (Caggiano et al., 2016; Rolfs, 2016), emphasizing the distributed nature of the computations underlying causal interactions. This finding also stresses that the detection, and the prediction, of causality is essential for processes outside sensory systems (e.g., for understanding other’s actions, for navigating, and for avoiding collisions). The neurophysiology subserving causal inference further extend the candidate cortical areas that might contibute to the detection of causal relations, emphasizing the role of the frontal cortex for the flexible integration of multisensory representations (Cao et al., 2019; Coen et al., 2023).”

      However, there is also ample evidence that the perception of a simple causal relation—as we studied it in our experiments—escapes top-down cognitive influences. The perception of causality in launching events is described as automatic and irresistible, meaning that participants have the spontaneous impression of a causal relation, and participants typically do not voluntarily switch between a causal and a noncausal percept. This irresistibility has led several authors to discuss a modular organization underlying the detection of such events (Michotte, 1963; Scholl & Tremoulet, 2000). This view is further supported by a study that experimentally manipulated the contingencies between the movement of the two discs (Schlottmann & Shanks, 1992). In one condition the authors created a launching event where the second disc’s movement was perfectly correlated with a color change, but only sometimes coincided with the first disc’s movement offset. Nevertheless, participants reported seeing that the first disc caused the movement of second disc (regardless of the stronger statistical relationship with the color change). However, when asked to make conscious causal judgments, participants were aware of the color change as the true cause of the second disc’s motion—therefore recognizing its more reliable correlation. This study strongly suggests that perceived and judged causality (i.e., cognitive causal inference) can be dissociated (Schlottmann & Shanks, 1992). We have added this reference in the revised manuscript. Overall, we argue that our study focused on a visual routine that could be implemented in a simple bottom-up fashion, but we acknowledge throughout the manuscript, that in a more complex situation (e.g., integrating information from other sensory domains) the implementation could be realized in a more distributed fashion including top-down influences as in multisensory integration. However, it is important to stress that these potential top-down influences would be automatic and should not be confused with voluntary cognitive influences.

      “Note, however, that perceived causality can be distinguished from judged causality (Schlottmann & Shanks, 1992). Even when participants are aware that a third variable (e.g., a color change) is the best predictor of the movement of the second disc in launching events, they still perceive the first disc as causing the movement of the second disc (Schlottmann & Shanks, 1992).”

      (3) The authors rightly implicate the neural substrate of causal inference in the early sensory system. Given their study is pure psychophysics, a more elaborate discussion based on other studies that used brain measurements is needed (in my opinion) to put into perspective this conclusion. In particular, as I mentioned in the first point, the authors mainly discuss the potential neural substrate of early vision, however much has been done about the role of higher-tier cortical areas in causal inference e.g., see (Cao et al. 2019; Coen et al. 2021). 

      In the revised manuscript, we addressed the limitations of a purely psychophysical approach and acknowledged alternative implementations in the Discussion section.

      “Note that, while the present findings demonstrate direction-selectivity, it remains unclear where exactly that visual routine is located. As pointed out, it is also possible that the visual routine is located higher up in the visual system (or distributed across multiple levels) and is only using a directional-selective population response as input.”

      Moreover, we cite also the two suggested papers when referring to the role of cortical areas in causal inference (Cao et al, 2019; Coen et al., 2023):

      “Neurophysiological studies support the view of distributed neural processing underlying sensory causal interactions with the visual system playing a major role. Imaging studies in particular revealed a network for the perception of causality that is also involved in action observation (Blakemore et al., 2003; Fonlupt, 2003; Fugelsang et al., 2005; Roser et al., 2005). The fact that visual adaptation of causality occurs in a retinotopic reference frame emphazises the role of retinotopically organized areas within that network (e.g., V5 and the superior temporal sulcus). Interestingly, single cell recordings in area F5 of the primate brain revealed that motor areas are contributing to the perception of causality (Caggiano et al., 2016; Rolfs, 2016), emphasizing the distributed nature of the computations underlying causal interactions, and also stressing that the detection, and the prediction, of causality is essential for processes outside purely sensory systems (e.g., for understanding other’s actions, for navigating, and for avoiding collisions). The neurophysiological underpinnings in causal inference further extend the candidate cortical areas that might contibute to the detection of causal relations, emphasizing the role of the frontal cortex for the flexible integration of multisensory representations (Cao et al., 2019; Coen et al., 2023).”

      There were many areas in this manuscript that I liked: clever questions, experimental design, and statistical analysis.

      Thank you so much.

      Reviewer #1 (Recommendations for the authors):

      I congratulate the authors again on their manuscript and hope they will find my review helpful. Most of my notes are suggestions to the authors, and I hope will help them to improve the manuscript. None are intended to devalue their (interesting) work. 

      We would like to thank the reviewer for their thoughtful and encouraging comments.

      In the following, I use pX-lY template to refer to a particular page number, say page number X (pX), and line number, say line number Y (lY). 

      Major concerns and suggestions 

      - I would suggest simplifying the abstract and significance statement or putting more background in it. It's hard (at least for me) to understand if one is not familiar with the task used in this study. 

      We followed the reviewer’s suggestion and added more background in the beginning of the abstract. 

      We made the following changes:

      “Detecting causal relations structures our perception of events in the world. Here, we determined for visual interactions whether generalized (i.e., feature-invariant) or specialized (i.e., feature-selective) visual routines underlie the perception of causality. To this end, we applied a visual adaptation protocol to assess the adaptability of specific features in classical launching events of simple geometric shapes. We asked observers to report whether they observed a launch or a pass in ambiguous test events (i.e., the overlap between two discs varied from trial to trial). After prolonged exposure to causal launch events (the adaptor) defined by a particular set of features (i.e., a particular motion direction, motion speed, or feature conjunction), observers were less likely to see causal launches in subsequent ambiguous test events than before adaptation. Crucially, adaptation was contingent on the causal impression in launches as demonstrated by a lack of adaptation in non-causal control events. We assessed whether this negative aftereffect transfers to test events with a new set of feature values that were not presented during adaptation. Processing in specialized (as opposed to generalized) visual routines predicts that the transfer of visual adaptation depends on the feature-similarity of the adaptor and the test event. We show that negative aftereffects do not transfer to unadapted launch directions but do transfer to launch events of different speed. Finally, we used colored discs to assign distinct feature-based identities to the launching and the launched stimulus. We found that the adaptation transferred across colors if the test event had the same motion direction as the adaptor. In summary, visual adaptation allowed us to carve out a visual feature space underlying the perception of causality and revealed specialized visual routines that are tuned to a launch’s motion direction.”

      - The authors highlight the importance of studying causal inference and understanding the underlying mechanisms by probing adaptation, however, their introduction justifying that is, in my humble opinion, quite short. Perhaps in the cited paper, this is discussed extensively, but I'd suggest providing some elaboration in the manuscript. Otherwise, the study would be very specific to certain visual phenomena, rather than general mechanisms.  

      We have carefully considered the reviewer’s set of comments and concerns (e.g., the role of top-down influences, the contributions of the frontal cortex, and illustration of the computational level). They all appear to share the theme that the reviewer looks at our study from the perspective of Bayesian inference. We conducted the current study in the tradition of classical phenomena in the field of the perception of causality (in the tradition of Michotte, 1963 and as reviewed in Scholl & Tremoulet, 2000) which aims to uncover the relevant visual parameters and rules for detecting causal relations in the visual domain. Indeed, we think that a causal inference perspective promises a lot of new insights into the mechanisms underlying the classical phenomena described for the perception of causality. In the revised manuscript, we discuss therefore causal inference and how it relates to the current study. We now emphasize that in our study, a) we used visual adaptation to reveal the bottom-up processes that allow for the detection of a causal interaction in the visual domain, b) that the perception of causality also integrates signals from other domains (which we do not study here), and c) that the neural substrates underlying the perception of causality might be best described by a distributed network. By discussing Bayesian causal inference, we point out promising avenues for future research that may bridge the fields of the perception of causality and Bayesian causal inference. However, we also emphasize that perceived causality and judged causality can be dissociated (Schlottmann & Shanks, 1992).

      We added the following discussion:

      “We used visual adaptation to carve out a bottom-up visual routine for detecting causal interactions in form of launching events. However, we know that more complex behaviors of perceiving causal relations can result from integrating information across space (e.g., in causal capture; Scholl & Nakayama, 2002), across time (postdictive influence; Choi & Scholl, 2006), and across sensory modalities (Sekuler, Sekuler, & Lau, 1997). Bayesian causal inference has been particularly successful as a normative framework to account for multisensory integration (Körding et al., 2007; Shams & Beierholm, 2022). In that framework, the evidence for a common-cause hypothesis is competing with the evidence for an independent-causes hypothesis (Shams & Beierholm, 2022). The task in our experiments could be similarly formulated as two competing hypotheses for the second disc’s movement (i.e., the movement was caused by the first disc vs. the second disc did not move). This framework also emphasizes the distributed nature of the neural implementation for solving such inferences, showing the contributions of parietal and frontal areas in addition to sensory processing (for review see Shams & Beierholm, 2022). Moreover, even visual adaptation to contrast in mouse primary visual cortex is influenced by top-down factors such as behavioral relevance— suggesting a complex implementation of the observed adaptation results (Keller et al. 2017). The present experiments, however, presented purely visual events that do not require an integration across processing domains. Thus, the outcome of our suggested visual routine can provide initial evidence from within the visual system for a causal relation in the environment that may then be integrated with signals from other domains (e.g., auditory signals). Determining exactly how the perception of causality relates to mechanisms of causal inference and the neural implementation thereof is an exciting avenue for future research. Note, however, that perceived causality can be distinguished from judged causality: Even when participants are aware that a third variable (e.g., a color change) is the best predictor of the movement of the second disc in launching events, they still perceive the first disc as causing the movement of the second disc (Schlottmann & Shanks, 1992).”

      - I'd suggest, at the outset, already set the context, that your study of causal inference in the brain is specifically targeting the visual domain, if you like, in the discussion connect it  better to general ideas about causal inference in the brain (like the works by Ladan Shams and colleagues). 

      We would like to thank the reviewer for this comment. We followed the reviewer’s suggestion and made clear from the beginning that this paper is about the detection of causal relations in the visual domain. In the revised manuscript we write:

      “Here, we will study the mechanisms underlying the computations of causal interactions in the visual domain by capitalizing on visual adaptation of causality (Kominsky & Scholl, 2020; Rolfs et al., 2013). Adaptation is a powerful behavioral tool for discovering and dissecting a visual mechanism (Kohn, 2007; Webster, 2015) that provides an intriguing testing ground for the perceptual roots of causality.”

      As described in our reply to the previous comment, we now also discussed the ideas about causal inference.

      - To better illustrate the implication of your study on the computational level, I'd suggest putting it in the context of recent approaches to perception (point 2 of my public review). I think this is also aligned with the comment of Reviewer#3 on your line 32 (recommendation for authors).  

      In the revised manuscript, we now discuss the role of top-down influences in causal inference when addressing point 2 of the reviewer’s public review.

      Minor concerns and suggestions 

      - On p2-l3, I'd suggest providing a few examples for generalized and or specialized visual routines (given the importance of the abstract). I only got it halfway through the introduction. 

      We thank the reviewer for highlighting the need to better introduce the concept of a visual routine. We have chosen the term visual routine to emphasize that we locate the part of the mechanism that is affected by the adaptation in our experiments in the visual system. At the same time, the concept leaves space with respect to the extent to which the mechanism further involves mid- and higher-level processes. In the revised manuscript, we now refer to Ullman (1987) who introduced the concept of a visual routine—the idea of a modular operation that sequentially processes spatial and feature information. Moreover, we refer to the concept of attentional sprites (Cavanagh, Labianca, & Thornton, 2001)—attention-based visual routines that allow the visual system to semi-independently handle complex visual tasks (e.g., identifying biological motion).

      We add the following footnote to the introduction:

      “We use the term visual routine here to highlight that our adaptation experiments can reveal a causality detection mechanism that resides in the visual system. At the same time, calling it a routine emphasizes similarities with a local, semi-independent operation (e.g., the recognition of familiar motion patterns; see also Ullman, 1987; Cavanagh, Labianca, & Thornton, 2001) that can engage mid- and higher-level processes (e.g., during causal capture, Scholl & Nakayama, 2002; or multisensory integration, Körding et al., 2007).”

      In the abstract we now write:

      “Here, we determined for visual interactions whether generalized (i.e., feature-invariant) or specialized (i.e., feature-selective) visual routines underlie the perception of causality.”

      - On p4-l31, I'd suggest mentioning the Matlab version. I have experienced differences across different versions of Matlab (minor but still ...). 

      We added the Matlab Version.

      - On p6-l46 OSF-link is missing (that contains data and code). 

      Thank you. We made the OSF repository public and added the link to the revised manuscript.

      We added the following information to the revised manuscript.

      “The data analysis code has been deposited at the Open Science Framework and is publicly available https://osf.io/x947m/.”

      Reviewer #2 (Public Review):

      This paper seeks to determine whether the human visual system's sensitivity to causal interactions is tuned to specific parameters of a causal launching event, using visual adaptation methods. The three parameters the authors investigate in this paper are the direction of motion in the event, the speed of the objects in the event, and the surface features or identity of the objects in the event (in particular, having two objects of different colors). The key method, visual adaptation to causal launching, has now been demonstrated by at least three separate groups and seems to be a robust phenomenon. Adaptation is a strong indicator of a visual process that is tuned to a specific feature of the environment, in this case launching interactions. Whereas other studies have focused on retinotopically specific adaptation (i.e., whether the adaptation effect is restricted to the same test location on the retina as the adaptation stream was presented to), this one focuses on feature specificity. 

      The first experiment replicates the adaptation effect for launching events as well as the lack of adaptation event for a minimally different non-causal 'slip' event. However, it also finds that the adaptation effect does not work for launching events that do not have a direction of motion more than 30 degrees from the direction of the test event. The interpretation is that the system that is being adapted is sensitive to the direction of this event, which is an interesting and somewhat puzzling result given the methods used in previous studies, which have used random directions of motion for both adaptation and test events. 

      The obvious interpretation would be that past studies have simply adapted to launching in every direction, but that in itself says something about the nature of this direction-specificity: it is not working through opposed detectors. For example, in something like the waterfall illusion adaptation effect, where extended exposure to downward motion leads to illusory upward motion on neutral-motion stimuli, the effect simply doesn't work if motion in two opposed directions is shown (i.e., you don't see illusory motion in both directions, you just see nothing). The fact that adaptation to launching in multiple directions doesn't seem to cancel out the adaptation effect in past work raises interesting questions about how directionality is being coded in the underlying process. 

      We would like to thank the reviewer for that thoughtful comment. We added the described implication to the manuscript:

      “While the present study demonstrates direction-selectivity for the detection of launches, previous adaptation protocols demonstrated successful adaptation using adaptors with random motion direction (Rolfs et al., 2013; Kominsky & Scholl, 2020). These results therefore suggest independent direction-specific routines, in which adaptation to launches in one direction does not counteract an adaptation to launches in the opposite direction (as for example in opponent color coding).”

      In addition, one limitation of the current method is that it's not clear whether the motion direction-specificity is also itself retinotopically-specific, that is, if one retinotopic location were adapted to launching in one direction and a different retinotopic location adapted to launching in the opposite direction, would each test location show the adaptation effect only for events in the direction presented at that location? 

      This is an interesting idea! Because previous adaptation studies consistently showed retinotopic adaptation of causality, we would not expect to find transfer of directional tuning for launches to other locations. We agree that the suggested experiment on testing the reference frame of directional specificity constitutes an interesting future test of our findings.

      The second experiment tests whether the adaptation effect is similarly sensitive to differences in speed. The short answer is no; adaptation events at one speed affect test events at another. Furthermore, this is not surprising given that Kominsky & Scholl (2020) showed adaptation transfer between events with differences in speeds of the individual objects in the event (whereas all events in this experiment used symmetrical speeds). This experiment is still novel and it establishes that the speed-insensitivity of these adaptation effects is fairly general, but I would certainly have been surprised if it had turned out any other way. 

      We thank the reviewer for highlighting the link to an experiment reported in Kominsky & Scholl (2020). We report the finding of that experiment now in the revised manuscript.

      We added the following paragraph in the discussion:

      “For instance, we demonstrated a transfer of adaptation across speed for symmetrical speed ratios. This result complements a previous finding that reported that the adaptation to triggering events (with an asymmetric speed ratio of 1:3) resulted in significant retinotopic adaptation of ambiguous (launching) test events of different speed ratios (i.e., test events with a speed ratio of 1:1 and of 1:3; Kominsky & Scholl, 2020).”

      The third experiment tests color (as a marker of object identity), and pits it against motion direction. The results demonstrate that adaptation to red-launching-green generates an adaptation effect for green-launching-red, provided they are moving in roughly the same direction, which provides a nice internal replication of Experiment 1 in addition to showing that the adaptation effect is not sensitive to object identity. This result forms an interesting contrast with the infant causal perception literature. Multiple papers (starting with Leslie & Keeble, 1987) have found that 6-8-month-old infants are sensitive to reversals in causal roles exactly like the ones used in this experiment. The success of adaptation transfer suggests, very clearly, that this sensitivity is not based only on perceptual processing, or at least not on the same processing that we access with this adaptation procedure. It implies that infants may be going beyond the underlying perceptual processes and inferring genuine causal content. This is also not the first time the adaptation paradigm has diverged from infant findings: Kominsky & Scholl (2020) found a divergence with the object speed differences as well, as infants categorize these events based on whether the speed ratio (agent:patient) is physically plausible (Kominsky et al., 2017), while the adaptation effect transfers from physically implausible events to physically plausible ones. This only goes to show that these adaptation effects don't exhaustively capture the mechanisms of early-emerging causal event representation. 

      We would like to thank the reviewer for highlighting the similarities (and differences) to the seminal study by Leslie and Keeble (1987). We included a discussion with respect to that paper in the revised manuscript. Indeed, that study showed a recovery from habituation to launches after reversal of the launching events. In their study, the reversal condition resulted in a change of two aspects, 1) motion direction and 2) a change of what color is linked to either cause (i.e., agent) or effect (i.e, patient). Our study, based on visual adaptation in adults, suggests that switching the two colors is not necessary for a recovery from the habituation, provided the motion direction is reversed. Importantly, the reversal of the motion direction only affected the perception of causality after adapting to launches (but not to slip events), which is consistent with Leslie and Keeble’s (1987) finding that the effect of a reversal is contingent on habituation/adaptation to a causal relationship (and is not observed for non-causal delayed launches). Based on our findings, we predict that switching colors without changing the event’s motion direction would not result in a recovery from habituation. Obviously, for infants, color may play a more important role for establishing an object identity than it does for adults, which could explain potential differences. We also agree with the reviewer’s point that the adaptation protocol might tap into different mechanisms than revealed by habituation studies in infants (e.g, Kominsky et al., 2017 vs. Kominsky & Scholl, 2020). 

      We revised the manuscript accordingly when discussing the role of direction selectivity in our study:

      “Habituation studies in six-months-old infants also demonstrated that the reversal of a launch resulted in a recovery from habituation to launches (while a non-causal control condition of delayed-launches did not; Leslie & Keeble, 1987). In their study, the reversal of motion direction was accompanied by a reversal of the color assignment to the cause-effectrelationship. In contrast, our findings suggest, that in adults color does not play a major role in the detection of a launch. Future studies should further delineate similarities and differences obtained from adaptation studies in adults and habituation studies in children (e.g., Kominsky et al., 2017; Kominsky & Scholl, 2020).”

      One overarching point about the analyses to take into consideration: The authors use a Bayesian psychometric curve-fitting approach to estimate a point of subjective equality (PSE) in different blocks for each individual participant based on a model with strong priors about the shape of the function and its asymptotic endpoints, and this PSE is the primary DV across all of the studies. As discussed in Kominsky & Scholl (2020), this approach has certain limitations, notably that it can generate nonsensical PSEs when confronted with relatively extreme response patterns. The authors mentioned that this happened once in Experiment 3 and that a participant had to be replaced. An alternate approach is simply to measure the proportion of 'pass' reports overall to determine if there is an adaptation effect. I don't think this alternate analysis strategy would greatly change the results of this particular experiment, but it is robust against this kind of self-selection for effects that fit in the bounds specified by the model, and may therefore be worth including in a supplemental section or as part of the repository to better capture the individual variability in this effect. 

      We largely agree with these points. Indeed, we adopted the non-parametric analysis for a recent series of experiments in which the psychometric curves were more variable (Ohl & Rolfs, Vision Sciences Society Meeting 2024). In the present study, however, the model fits were very convincing. In Figures S1, S2 and S3 we show the model fits for each individual observer and condition on top of the mean proportion of launch reports. The inferential statistics based on the points of subjective equality, therefore, allowed us to report our findings very concisely.

      In general, this paper adds further evidence for something like a 'launching' detector in the visual system, but beyond that, it specifies some interesting questions for future work about how exactly such a detector might function. 

      We thank the reviewer for this positive overall assessment.

      Reviewer #2 (Recommendations for the authors):

      Generally, the paper is great. The questions I raised in the public review don't need to be answered at this time, but they're exciting directions for future work. 

      We would like to thank the reviewer for the encouraging comments and thoughtful ideas on how to improve the manuscript.

      I would have liked to see a little more description of the model parameters in the text of the paper itself just so readers know what assumptions are going into the PSE estimation. 

      We followed the reviewer’s suggestion and added more information regarding the parameter space (i.e., ranges of possible parameters of the logistic model) that we used for obtaining the model fits. 

      Specifically, we added the following information in the manuscript:

      “For model fitting, we constrained the range of possible estimates for each parameter of the logistic model. The lower asymptote for the proportion of reported launches was constrained to be in the range 0–0.75, and the upper asymptote in the range 0.25–1. The intercept of the logistic model was constrained to be in the range 1–15, and the slope was constrained to be in the range –20 to –1.”

      The models provided very good fits as can be appreciated by the fits per individual and experimental condition which we provide in response to the public comments. Please note, that all data and analysis scripts are available at the Open Science Framework (https://osf.io/x947m/).

      I also have a recommendation about Figure 1b: Color-code "Feature A", "Feature B", and "Feature C" and match those colors with the object identity/speed/direction text. I get what the figure is trying to convey but to a naive reader there's a lot going on and it's hard to interpret. 

      We followed the reviewer’s suggestion and revised the visualization accordingly.

      If you have space, figures showing the adaptation and corresponding test events for each experimental manipulation would also be great, particularly since the naming scheme of the conditions is (necessarily) not entirely consistent across experiments. It would be a lot of little figures, I know, but to people who haven't spent as long staring at these displays as we have, they're hard to envision based on description alone. 

      We followed the reviewer’s recommendation and added a visualization of the adaptor and the test events for the different experiments in Figure 2.

      Reviewer #3 (Public Review):

      We thank the reviewer for their thoughtful comments, which we carefully addressed to improve the revised manuscript. 

      Summary: 

      This paper presents evidence from three behavioral experiments that causal impressions of "launching events", in which one object is perceived to cause another object to move, depending on motion direction-selective processing. Specifically, the work uses an adaptation paradigm (Rolfs et al., 2013), presenting repetitive patterns of events matching certain features to a single retinal location, then measuring subsequent perceptual reports of a test display in which the degree of overlap between two discs was varied, and participants could respond "launch" or "pass". The three experiments report results of adapting to motion direction, motion speed, and "object identity", and examine how the psychometric curves for causal reports shift in these conditions depending on the similarity of the adapter and test. While causality reports in the test display were selective for motion direction (Experiment 1), they were not selective for adapter-test speed differences (Experiment 2) nor for changes in object identity induced via color swap (Experiment 3). These results support the notion that causal perception is computed (in part) at relatively early stages of sensory processing, possibly even independently of or prior to computations of object identity. 

      Strengths: 

      The setup of the research question and hypotheses is exceptional. The experiments are carefully performed (appropriate equipment, and careful control of eye movements). The slip adaptor is a really nice control condition and effectively mitigates the need to control motion direction with a drifting grating or similar. Participants were measured with sufficient precision, and a power curve analysis was conducted to determine the sample size. Data analysis and statistical quantification are appropriate. Data and analysis code are shared on publication, in keeping with open science principles. The paper is concise and well-written. 

      Weaknesses: 

      The biggest uncertainty I have in interpreting the results is the relationship between the task and the assumption that the results tell us about causality impressions. The experimental logic assumes that "pass" reports are always non-causal impressions and "launch" reports are always causal impressions. This logic is inherited from Rolfs et al (2013) and Kominsky & Scholl (2020), who assert rather than measure this. However, other evidence suggests that this assumption might not be solid (Bechlivanidis et al., 2019). Specifically, "[our experiments] reveal strong causal impressions upon first encounter with collision-like sequences that the literature typically labels "non-causal"" (Bechlivanidis et al., 2019) -- including a condition that is similar to the current "pass". It is therefore possible that participants' "pass" reports could also involve causal experiences. 

      We agree with the reviewer that our study assumes that the launch-pass dichotomy can be mapped onto a dimension of causal to non-causal impressions. Please note that the choice for this launch-pass task format was intentional. We consider it an advantage that subjects do not have to report causal vs non-causal impressions directly, as it allows us to avoid the oftencriticized decision biases that come with asking participants about their causal impression (Joynson, 1971; for a discussion see Choi & Scholl, 2006). This comes obviously at the cost that participants did not directly report their causal impression in our experiments. There is however evidence that increasing overlap between the discs monotonically decreases the causal impression when directly asking participants to report their causal impression (Scholl & Nakayama, 2004). We believe, therefore, that the assumption of mapping between launchesto-passes and causal-to-noncausal is well-justified. At the same time, the expressed concern emphasizes the need to develop further, possibly implicit measure for causal impressions (see Völter & Huber, 2021).

      However, as pointed out by the reviewer, a recent paper demonstrated that on first encounter participants can have impressions in response to a pass event that are different from clearly non-causal impressions (Bechlivanidis et al., 2019). As demonstrated in the same paper, displaying a canonical launch decreased the impression of causality when seeing pass events in subsequent trials. In our study, participants completed an entire training session before running the main experiments. It is therefore reasonable to expect that participants observed passes as non-causal events given the presence of clear causal references. Nevertheless, we now acknowledge this concern directly in the revised manuscript.

      We added the following paragraph to the discussion:

      “In our study, we assessed causal perception by asking observers to report whether they observed a launch or a pass in events of varying ambiguity. This method assumes that launches and passes can be mapped onto a dimension that ranges from causal to non-causal impressions. It has been questioned whether pass events are a natural representative of noncausal events: Observers often report high impressions of causality upon first exposure to pass events, which then decreased after seeing a canonical launch (Bechlivanidis, Schlottmann, & Lagnado, 2019). In our study, therefore, participants completed a separate session that included canonical launches before starting the main experiment.”

      Furthermore, since the only report options are "launch" or "pass", it is also possible that "launch" reports are not indications of "I experienced a causal event" but rather "I did not experience a pass event". It seems possible to me that different adaptation transfer effects (e.g. selectivity to motion direction, speed, or color-swapping) change the way that participants interpret the task, or the uncertainty of their impression. For example, it could be that adaptation increases the likelihood of experiencing a "pass" event in a direction-selective manner, without changing causal impressions. Increases of "pass" impressions (or at least, uncertainty around what was experienced) would produce a leftward shift in the PSE as reported in Experiment 1, but this does not necessarily mean that experiences of causal events changed. Thus, changes in the PSEs between the conditions in the different experiments may not directly reflect changes in causal impressions. I would like the authors to clarify the extent to which these concerns call their conclusions into question. 

      Indeed, PSE shifts are subject to cognitive influences and can even be voluntarily shifted (Morgan et al., 2012). We believe that decision biases (e.g., reporting the presence of launch before adaptation vs. reporting the absence of a pass after the adaptation) are unlikely to explain the high specificity of aftereffects observed in the current study. While such aftereffects are very typical of visual processing (Webster, 2015), it is unclear how a mechanism that increase the likelihood of perceiving a pass could account for the retinotopy of adaptation to launches (Rolfs et al., 2013) or the recently reported selective transfer of adaptation for only some causal categories (Kominsky et al., 2020). The latter authors revealed a transfer of adaptation from triggering to launching, but not from entraining events to launching. Based on these arguments, we decided to not include this point in the revised manuscript.

      Leaving these concerns aside, I am also left wondering about the functional significance of these specialised mechanisms. Why would direction matter but speed and object identity not? Surely object identity, in particular, should be relevant to real-world interpretations and inputs of these visual routines? Is color simply too weak an identity? 

      We agree that it would be beneficial to have mechanisms in place that are specific for certain object identities. Overall, our results fit very well to established claims that only spatiotemporal parameters mediate the perception of causality (Michotte, 1963; Leslie, 1984; Scholl & Tremoulet, 2000). We have now explicitly listed these references again in the revised manuscript. It is important to note, that an understanding of a causal relation could suffice to track identity information based purely on spatiotemporal contingencies, neglecting distinguishing surface features.

      We revised the manuscript and state:

      “Our findings therefore provide additional support for the claim that an event’s spatiotemporal parameters mediate the perception of causality (Michotte, 1963; Leslie, 1984; Scholl & Tremoulet, 2000).”

      Moreover, we think our findings of directional selectivity have functional relevance. First, direction-selective detection of collisions allows for an adaptation that occurs separately for each direction. That means that the visual system can calibrate these visual routines for detecting causal interactions in response to real-world statistics that reflect differences in directions. For instance, due to gravity, objects will simply fall to the ground. Causal relation such as launches are likely to be more frequent in horizontal directions, along a stable ground. Second, we think that causal visual events are action-relevant, that is, acting on (potentially) causal events promises an advantage (e.g., avoiding a collision, or quickly catching an object that has been pushed away). The faster we can detect such causal interactions, the faster we can react to them. Direction-selective motion signals are available in the first stages of visual processing. Visual routines that are based on these direction-selective motion signals promise to enable such fast computations. Please note, however, that while our present findings demonstrate direction-selectivity, they do not pinpoint where exactly that visual routine is located. It is quite possible that the visual routine is located higher up in the visual system, relying on a direction-selective population response as input.

      We added these points to the discussion of the functional relevance: 

      “We suggest that at least two functional benefits result from a specialized visual routine for detecting causality. First, a direction-selective detection of launches allows adaptation to occur separately for each direction. That means that the visual system can automatically calibrate the sensitivity of these visual routines in response to real-world statistics. For instance, while falling objects drop vertically towards the ground, causal relations such as launches are common in horizontal directions moving along a stable ground. Second, we think that causal visual events are action-relevant, and the faster we can detect such causal interactions, the faster we can react to them. Direction-selective motion signals are available very early on in the visual system. Visual routines that are based on these direction-selective motion signals may enable faster detection. While our present findings demonstrate direction-selectivity, they do not pinpoint where exactly that visual routine is located. It is possible that the visual routine is located higher up in the visual system (or distributed across multiple levels), relying on a direction-selective population response as input.”

      Reviewer #3 (Recommendations for the authors):

      - The concept of "visual routines" is used without introduction; for a general-interest audience it might be good to include a definition and reference(s) (e.g. Ullman.). 

      Thank you very much for highlighting that point. We have chosen the term visual routine to emphasize that we locate the part of the mechanism that is affected by the adaptation in our experiments in the visual system, but at the same time it leaves space regarding the extent to which the mechanism further involves mid- and higher-level processes. The term thus has a clear reference to a visual routine by Ullman (1987). We have now addressed what we mean by visual routine, and we also included the reference in the revised manuscript.

      We add the following footnote to the introduction:

      “We use the term visual routine here to highlight that our adaptation experiments can reveal a causality detection mechanism that resides in the visual system. At the same time, calling it a routine emphasizes similarities with a local, semi-independent operation (e.g., the recognition of familiar motion patterns; see also Ullman, 1987; Cavanagh, Labianca, & Thornton, 2001) that can engage mid- and higher-level processes (e.g., during causal capture, Scholl & Nakayama, 2002; or multisensory integration, Körding et al., 2007).”

      - I would appreciate slightly more description of the phenomenology of the WW adaptors: is this Michotte's "entraining" event? Does it look like one disc shunts the other?  

      The stimulus differs from Michotte's entrainment event in both spatiotemporal parameters and phenomenology. We added videos for the launch, pass and slip events as Supplementary Material.

      Moreover, we described the slip event in the methods section:

      “In two additional sessions, we presented slip events as adaptors to control that the adaptation was specific for the impression of causality in the launching events. Slip events are designed to match the launching events in as many physical properties as possible while producing a very different, non-causal phenomenology. In slip events, the first peripheral disc also moves towards a stationary disc. In contrast to launching events, however, the first disc passes the stationary disc and stops only when it is adjacent to the opposite edge of the stationary disc. While slip events do not elicit a causal impression, they have the same number of objects and motion onsets, the same motion direction and speed, as well as the same spatial area of the event as launches.”

      In the revised manuscript, we added also more information on the slip event in the beginning of the results section. Importantly, the stimulus typically produces the impression of two independent movements and thus serves as a non-causal control condition in our study. Only anecdotally, some observers (not involved in this study) who saw the stimulus spontaneously described their phenomenology of seeing a slip event as a double step or a discus throw.

      We added the following description to the results section:

      “Moreover, we compared the visual adaptation to launches to a (non-causal) control condition in which we presented slip events as adaptor. In a slip event, the initially moving disc passes completely over the stationary disc, stops immediately on the other side, and then the initially stationary disc begins to move in the same direction without delay. Thus, the two movements are presented consecutively without a temporal gap. This stimulus typically produces the impression of two independent (non-causal) movements.”

      - In general more illustrations of the different conditions (similar to Figure 1c but for the different experimental conditions and adaptors) might be helpful for skim readers.  

      We followed the reviewer’s recommendation and added a visualization of the adaptor and the test events for the different experiments in Figure 2.

      - Were the luminances of the red and green balls in experiment 3 matched? Were participants checked for color anomalous vision?  

      Yes, we checked for color anomalous vision using the color test Tafeln zur Prüfung des Farbensinnes/Farbensehens (Kuchenbecker & Broschmann, 2016). We added that information to the manuscript. The red and green discs were not matched for luminance. We measured the luminance after the experiment (21 cd/m<sup>2</sup> for the green disc and 6 cd/m<sup>2</sup> for the red disc). Please note, that the differences in luminance should not pose a problem for the interpretation of the results, as we see a transfer of the adaptation across the two different colors.

      We added the following information to the manuscript:

      “The red and green discs were not matched for luminance. Measurements obtained after the experiments yielded a luminance of 21 cd/m<sup>2</sup> for the green disc and 6 cd/m<sup>2</sup> for the red disc.”

      “All observers had normal or corrected-to-normal vision and color vision as assessed using the color test Tafeln zur Prüfung des Farbensinnes/Farbensehens (Kuchenbecker & Broschmann, 2016).”

      - Relationship of this work to the paper by Arnold et al., (2015). That paper suggested that some effects of adaptation of launching events could be explained by an adaptation of object shape, not by causality per se. It is superficially difficult to see how one could explain the present results from the perspective of object "squishiness" -- why would this be direction selective? In other words, the present results taken at face value call the "squishiness" explanation into question. The authors could consider an explanation to reconcile these findings in their discussion. 

      Indeed, the paper by Arnold and colleagues (2014) suggested that a contact-launch adaptor could lead to a squishiness aftereffect—arguing that the object elasticity changed in response to the adaptation.  Importantly, the same study found an object-centered adaptation effect rather than a retinotopic adaptation effect. However, the retinotopic nature of the negative aftereffect as used in our study has been repeatedly replicated (for instance Kominsky & Scholl, 2020). Thus, the divergent results of Arnold and colleagues may have resulted from differences in the task (i.e., observers had to judge whether they perceived a soft vs. hard bounce), or the stimuli (i.e., bounces of a disc and a wedge, and the discs moving on a circular trajectory). It would be important to replicate these results first and then determine whether their squishiness effect would be direction-selective as well. We now acknowledge the study by Arnold and colleagues in the discussion:

      “The adaptation of causality is spatially specific to the retinotopic coordinates of the adapting stimulus (Kominsky & Scholl, 2020; Rolfs et al., 2013; for an object-centered elasiticity aftereffect using a related stimulus on a circular motion path, see Arnold et al., 2015), suggesting that the detection of causal interactions is implemented locally in visual space.”

      - Line 32: "showing that a specialized visual routine for launching events exists even within separate motion direction channels". This doesn't necessarily mean the routine is within each separate direction channel, only that the output of the mechanism depends on the population response over motion direction. The critical motion computation could be quite high level -- e.g. global pattern motion in MST. Please clarify the claim. 

      We agree with the reviewer, that it is also possible that critical parts of the visual routine could simply use the aggregated population response over motion direction at higher-levels of processing. We acknowledge this possibility in the discussion of the functional relevance of the proposed mechanism and when suggesting that a distributed brain network may contribute to the perception of causality.

      We would like to highlight the following two revised paragraphs.

      “[…] Second, we think that causal visual events are action-relevant, and the faster we can detect such causal interactions, the faster we can react to them. Direction-selective motion signals are available very early on in the visual system. Visual routines that are based on these direction-selective motion signals may enable faster detection. While our present findings demonstrate direction-selectivity, they do not pinpoint where exactly that visual routine is located. It is possible that the visual routine is located higher up in the visual system (or distributed across multiple levels), relying on a direction-selective population response as input.”

      Moreover, when discussing the neurophysiological literature we write:

      “Interestingly, single cell recordings in area F5 of the primate brain revealed that motor areas are contributing to the perception of causality (Caggiano et al., 2016; Rolfs, 2016), emphasizing the distributed nature of the computations underlying causal interactions. This finding also stresses that the detection, and the prediction, of causality is essential for processes outside purely sensory systems (e.g., for understanding other’s actions, for navigating, and for avoiding collisions).”

      -  p. 10 line 30: typo "particual".  

      Done.

      -  p. 10 line 37: "This findings rules out (...)" should be singular "This finding rules out (...)". 

      Done.

      -  Spelling error throughout: "underly" should be "underlie". 

      Done.

      -  p.11 line 29: "emerges fast and automatic" should be "automatically". 

      Done.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript focuses on the olfactory system of Pieris brassicae larvae and the importance of olfactory information in their interactions with the host plant Brassica oleracea and the major parasitic wasp Cotesia glomerata. The authors used CRISPR/Cas9 to knockout odorant receptor co-receptors (Orco), and conducted a comparative study on the behavior and olfactory system of the mutant and wild-type larvae. The study found that Orco-expressing olfactory sensory neurons in antennae and maxillary palps of Orco knockout (KO) larvae disappeared, and the number of glomeruli in the brain decreased, which impairs the olfactory detection and primary processing in the brain. Orco KO caterpillars show weight loss and loss of preference for optimal food plants; KO larvae also lost weight when attacked by parasitoids with the ovipositor removed, and mortality increased when attacked by untreated parasitoids. On this basis, the authors further studied the responses of caterpillars to volatiles from plants attacked by the larvae of the same species and volatiles from plants on which the caterpillars were themselves attacked by parasitic wasps. Lack of OR-mediated olfactory inputs prevents caterpillars from finding suitable food sources and from choosing spaces free of enemies.

      Strengths:

      The findings help to understand the important role of olfaction in caterpillar feeding and predator avoidance, highlighting the importance of odorant receptor genes in shaping ecological interactions.

      Weaknesses:

      There are the following major concerns:

      (1) Possible non-targeted effects of Orco knockout using CRISPR/Cas9 should be analyzed and evaluated in Materials and Methods and Results.

      Thank you for your suggestion. In the Materials and Methods, we mention how we selected the target region and evaluated potential off-target sites by Exonerate and CHOPCHOP. Neither of these methods found potential off-target sites with a more-than-17-nt alignment identity. Therefore, we assumed no off-target effect in our Orco KO. Furthermore, we did not find any developmental differences between WT and KO caterpillars when these were reared on leaf discs in Petri dishes (Fig S4). We will further highlight this information on the off-target evaluation in the Results section of our revised manuscript.

      (2) Figure 1E: Only one olfactory receptor neuron was marked in WT. There are at least three olfactory sensilla at the top of the maxillary palp. Therefore, to explain the loss of Orco-expressing neurons in the mutant (Figure 1F), a more rigorous explanation of the photo is required.

      Thank you for pointing this out. The figure shows only a qualitative comparison between WT and KO and we did not aim to determine the total number of Orco positive neurons in the maxillary palps or antennae of WT and KO caterpillars, but please see our previous work for the neuron numbers in the caterpillar antennae (Wang et al., 2023). We did indeed find more than one neuron in the maxillary palps, but as these were in very different image planes it was not possible to visualize them together. However, we will add a few sentences in the Results and Discussion section to explain the results of the maxillary palp Orco staining.

      (3) In Figure 1G, H, the four glomeruli are circled by dotted lines: their corresponding relationship between the two figures needs to be further clarified.

      Thank you for pointing this out. The four glomeruli in Figure 1G and 1H are not strictly corresponding. We circled these glomeruli to highlight them, as they are the best visualized and clearly shown in this view. In this study, we only counted the number of glomeruli in both WT and KO, however, we did not clarify which glomeruli are missing in the KO caterpillar brain. We will further explain this in the figure legend.

      (4) Line 130: Since the main topic in this study is the olfactory system of larvae, the experimental results of this part are all about antennal electrophysiological responses, mating frequency, and egg production of female and male adults of wild type and Orco KO mutant, it may be considered to include this part in the supplementary files. It is better to include some data about the olfactory responses of larvae.

      Thank you for your suggestion. We do agree with your suggestion, and we will consider moving this part to the supplementary information. Regarding larval olfactory response, we unfortunately failed to record any spikes using single sensillum recordings due to the difficult nature of the preparation; however, we do believe that this would be an interesting avenue for further research.

      (5) Line 166: The sentences in the text are about the choice test between " healthy plant vs. infested plant", while in Fig 3C, it is "infested plant vs. no plant". The content in the text does not match the figure.

      Thank you for pointing this out. The sentence is “We compared the behaviors of both WT and Orco KO caterpillars in response to clean air, a healthy plant and a caterpillar-infested plant”. We tested these three stimuli in two comparisons: healthy plant vs no plant, infested plant vs no plant. The two comparisons are shown in Figure 3C separately. We will aim to describe this more clearly in the revised version of the manuscript.

      (6) Lines 174-178: Figure 3A showed that the body weight of Orco KO larvae in the absence of parasitic wasps also decreased compared with that of WT. Therefore, in the experiments of Figure 3A and E, the difference in the body weight of Orco KO larvae in the presence or absence of parasitic wasps without ovipositors should also be compared. The current data cannot determine the reduced weight of KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      Thank you for pointing this out. We did not make a comparison between the data of Figures 3A and 3E since the two experiments were not conducted at the same time due to the limited space in our BioSafety Ⅲ greenhouse. We do agree that the weight decrease in Figure 3E is partly due to the reduced caterpillar growth shown in Figure 3A. However, we are confident that the additional decrease in caterpillar weight shown in Figure 3E is mainly driven by the presence of disarmed parasitoids. To be specific, the average weight in Figure 3A is 0.4544 g for WT and 0.4230 g for KO, KO weight is 93.1% of WT caterpillars. While in Figure 3E, the average weight is 0.4273 g for WT and 0.3637 g for KO, KO weight is 85.1% of WT caterpillars. We will discuss this interaction between caterpillar growth and the effect of the parasitoid attacks more extensively in the revised version of the manuscript.

      (7) Lines 179-181: Figure 3F shows that the survival rate of larvae of Orco KO mutant decreased in the presence of parasitic wasps, and the difference in survival rate of larvae of WT and Orco KO mutant in the absence of parasitic wasps should also be compared. The current data cannot determine whether the reduced survival of the KO mutant is due to the Orco knockout or the presence of parasitic wasps.

      We are happy that you highlight this point. When conducting these experiments, we selected groups of caterpillars and carefully placed them on a leaf with minimal disturbance of the caterpillars, which minimized hurting and mortality. We did test the survival of caterpillars in the absence of parasitoid wasps from the experiment presented in Figure 3A, although this was missing from the manuscript. There is no significant difference in the survival rate of caterpillars between the two genotypes in the absence of wasps (average mortality WT = 8.8 %, average mortality KO = 2.9 %; P = 0.088, Wilcoxon test), so the decreased survival rate is most likely due to the attack of the wasps. We will add this information to the revised version of the manuscript.

      (8) In Figure 4B, why do the compounds tested have no volatiles derived from plants? Cruciferous plants have the well-known mustard bomb. In the behavioral experiments, the larvae responses to ITC compounds were not included, which is suggested to be explained in the discussion section.

      Thank you for the suggestion. We assume you mean Figure 4D/4E instead of Figure 4B. In Figure 4B, many of the identified chemical compounds are essentially plant volatiles, especially those from caterpillar frass and caterpillar spit. In Figure 4D/4E, most of the tested chemicals are derived from plants. We did include several ITCs in the butterfly EAG tests shown in figure 2A/B, however because the butterfly antennae did not respond strongly to ITCs, we did not include ITCs in the subsequent larval behavioural tests. Instead, the tested chemicals in Figure 4D/4E either elicit high EAG responses of butterflies or have been identified as significant by VIP scores in the chemical analyses. We will add this explanation to the revised version of our manuscript.

      (9) The custom-made setup and the relevant behavioral experiments in Figure 4C need to be described in detail (Line 545).

      We will add more detailed descriptions for the setup and method in the Materials and Methods.

      (10) Materials and Methods Line 448: 10 μL paraffin oil should be used for negative control.

      Thank you for pointing this out. We used both clean filter paper and clean filter paper with 10 μL paraffin oil as negative controls, but we did not find a significant difference between the two controls. Therefore, in the EAG results of Figure 2A/2B, we presented paraffin oil as one of the tested chemicals. We will re-run our statistical tests with paraffin oil as negative control, although we do not expect any major differences to the previous tests.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigated the effect of olfactory cues on caterpillar performance and parasitoid avoidance in Pieris brassicae. The authors knocked out Orco to produce caterpillars with significantly reduced olfactory perception. These caterpillars showed reduced performance and increased susceptibility to a parasitoid wasp.

      Strengths:

      This is an impressive piece of work and a well-written manuscript. The authors have used multiple techniques to investigate not only the effect of the loss of olfactory cues on host-parasitoid interactions, but also the mechanisms underlying this.

      Weaknesses:

      (1) I do have one major query regarding this manuscript - I agree that the results of the caterpillar choice tests in a y-maze give weight to the idea that olfactory cues may help them avoid areas with higher numbers of parasitoids. However, the experiments with parasitoids were carried out on a single plant. Given that caterpillars in these experiments were very limited in their potential movement and source of food - how likely is it that avoidance played a role in the results seen from these experiments, as opposed to simply the slower growth of the KO caterpillars extending their period of susceptibility? While the two mechanisms may well both take place in nature - only one suggests a direct role of olfaction in enemy avoidance at this life stage, while the other is an indirect effect, hence the distinction is important.

      We do agree with your comment that both mechanisms may be at work in nature, and we do address this in the Discussion section. In our study, we did find that wildtype caterpillars were more efficient in locating their food source and did grow faster on full plants than knockout caterpillars. This faster growth will enable wildtype caterpillars to more quickly outgrow the life-stages most vulnerable to the parasitoids (L1 and L2). The olfactory system therefore supports the escape from parasitoids indirectly by enhancing feeding efficiency directly.

      In addition, we show in our Y-tube experiments that WT caterpillars were able to avoid plant where conspecifics are under the attack by parasitiods (Figure 3D). Therefore, we speculate that WT caterpillars make use of volatiles from the plant or from conspecifics via their spit or faeces to avoid plants or leaves potentially attracting natural enemies. Knockout caterpillars are unable to use these volatile danger cues and therefore do not avoid plants or leaves that are most attractive to their natural enemies, making KO caterpillars more susceptible and leading to more natural enemy harassment. Through this, olfaction also directly impacts the ability of a caterpillar to find an enemy-free feeding site.

      We think that olfaction supports the enemy avoidance of caterpillars via both these mechanisms, although at different time scales. Unfortunately, our analysis was not detailed enough to discern the relative importance of the two mechanisms we found. However, we feel that this would be an interesting avenue for further research. Moreover, we will sharpen our discussion on the potential importance of the two different mechanisms in the revised version of this manuscript.

      (2) My other issue was determining sample sizes used from the text was sometimes a bit confusing. (This was much clearer from the figures).

      We will revise the sample size in the text to make it clearer.

      (3) I also couldn't find the test statistics for any of the statistical methods in the main text, or in the supplementary materials.

      Thank you for pointing this out. We will provide more detailed test statistics in the main text and in the supplementary materials of the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Summary:

      This paper describes molecular dynamics simulations (MDS) of the dynamics of two T-cell receptors (TCRs) bound to the same major histocompatibility complex molecule loaded with the same peptide (pMHC). The two TCRs (A6 and B7) bind to the pMHC with similar affinity and kinetics, but employ different residue contacts. The main purpose of the study is to quantify via MDS the differences in the inter- and intra-molecular motions of these complexes, with a specific focus on what the authors describe as catch-bond behavior between the TCRs and pMHC, which could explain how T-cells can discriminate between different peptides in the presence of weak separating force.

      Strengths:

      The authors present extensive simulation data that indicates that, in both complexes, the number of high-occupancy interdomain contacts initially increases with applied load, which is generally consistent with the authors’ conclusion that both complexes exhibit catch-bond behavior, although to different extents. In this way, the paper somewhat expands our understanding of peptide discrimination by T-cells.

      a. The reviewer makes thoughtful assessment of our manuscript. While our manuscript is meant to be a “short” contribution, our significant new finding is that even for TCRs targeting the same pMHC, having similar structures, and leading to similar functional outcomes in conventional assays, their response to applied load can be different. This supports out recent experimental work where TCRs targeting the same pMHC differed in their catch bond characteristics, and importantly, in their response to limiting copy numbers of pMHCs on the antigen-presenting cell (Akitsu et al., Sci. Adv., 2024).

      Weaknesses:

      While generally well supported by data, the conclusions would nevertheless benefit from a more concise presentation of information in the figures, as well as from suggesting experimentally testable predictions.

      b. We have updated all figures for clear and streamlined presentation. We have also created four figure supplements to cover more details.

      Regarding testable predictions, an important prediction is that B7 TCR would exhibit a weaker catch bond behavior than A6 (line 297–298). This is a nontrivial prediction because the two TCRs targeting the same pMHC have similar structures and are functionally similar in conventional assays. This prediction can be tested by singlemolecule optical tweezers experiments. Based on our recent experiments Akitsu et al., Sci. Adv. (2024), we also predict that A6 and B7 TCRs will differ in their ability to respond to cases when the number of pMHC molecules presented are limited. Details of how they would differ require further investigation, which is beyond the scope of the present work (line 314-319).

      Another testable prediction for the conservation of the basic allostery mechanism is to test the Cβ FG-loop deletion mutant located at the hinge region of the β chain, where the deletion severely impairs the catch bond formation (line 261–264).

      Reviewer 2:

      In this work, Chang-Gonzalez and coworkers follow up on an earlier study on the force-dependence of peptide recognition by a T-cell receptor using all-atom molecular dynamics simulations. In this study, they compare the results of pulling on a TCR-pMHC complex between two different TCRs with the same peptide. A goal of the paper is to determine whether the newly studied B7 TCR has the same load-dependent behavior mechanism shown in the earlier study for A6 TCR. The primary result is that while the unloaded interaction strength is similar, A6 exhibits more force stabilization.

      This is a detailed study, and establishing the difference between these two systems with and without applied force may establish them as a good reference setup for others who want to study mechanobiological processes if the data were made available, and could give additional molecular details for T-Cell-specialists. As written, the paper contains an overwhelming amount of details and it is difficult (for me) to ascertain which parts to focus on and which results point to the overall take-away messages they wish to convey.

      R2-a. As mentioned above and as the reviewer correctly pointed out, the condensed appearance of this manuscript arose largely because we intended it to be a Research Advances article as a short follow up study of our previous paper on A6 TCR published in eLife. Most of the analysis scripts for the A6 TCR study are already available on Github. For the present manuscript, we have created a separate Github repository containing sample simulation systems and scripts for the B7 TCR.

      Regarding the focus issue, it is in part due to the complex nature of the problem, which required simulations under different conditions and multi-faceted analyses. We believe the extensive updates to the figures and texts make clearer and improved presentation. But we note that even in the earlier version, the reviewer pointed out the main take-away message well: “The primary result is that while the unloaded interaction strength is similar, A6 exhibits more force stabilization.

      Detailed comments:

      (1) In Table 1 - are the values of the extension column the deviation from the average length at zero force (that is what I would term extension) or is it the distance between anchor points (which is what I would assume based on the large values. If the latter, I suggest changing the heading, and then also reporting the average extension with an asterisk indicating no extensional restraints were applied for B7-0, or just listing 0 load in the load column. Standard deviation in this value can also be reported. If it is an extension as I would define it, then I think B7-0 should indicate extension = 0+/- something. The distance between anchor points could also be labeled in Figure 1A.

      R2-b. “Extension” is the distance between anchor points that the reviewer is referring to (blue spheres at the ends of the added strands in Figure 1A). While its meaning should be clear in the section “Laddered extensions” in “MD simulation protocol” (line 357–390), in a strict sense, we agree that using it for the end-to-end distance can be confusing. However, since we have already used it in our previous two papers (Hwang et al., PNAS 2020 and Chang-Gonzalez et al., eLife, 2024), we prefer to keep it for consistency. Instead, in the caption of Table 1, we explained its meaning, and also explicitly labeled it in Figure 1A, as the reviewer suggested.

      Please also note that the no-load case B7<sup>0</sup> was performed by separately building a TCR-pMHC complex without added linkers (line 352), and holding the distal part of pMHC (the α3 domain) with weak harmonic restraints (line 406–408). Thus, no extension can be assigned to B7<sup>0</sup>. We added a brief explanation about holding the MHC α3 domain for B7<sup>0</sup> in line 83–85.

      (2) As in the previous paper, the authors apply ”constant force” by scanning to find a particular bond distance at which a desired force is selected, rather than simply applying a constant force. I find this approach less desirable unless there is experimental evidence suggesting the pMHC and TCR were forced to be a particular distance apart when forces are applied. It is relatively trivial to apply constant forces, so in general, I would suggest this would have been a reasonable comparison. Line 243-245 speculates that there is a difference in catch bonding behavior that could be inferred because lower force occurs at larger extensions, but I do not believe this hypothesis can be fully justified and could be due to other differences in the complex.

      R2-c. There is indeed experimental evidence that the TCR-pMHC complex operates under constant separation. The spacing between a T-cell and an antigen-presenting cell is maintained by adhesion molecules such as the CD2CD58 pair, as explained in our paper on the A6 TCR Chang-Gonzalez et al., eLife, 2024 and also in our previous review paper Reinherz et al., PNAS, 2023. In in vitro single-molecule experiments, pulling to a fixed separation and holding is also commonly done. We added an explanation about this in line 79–83 of the manuscript. On the other hand, force between a T cell and and antigen-presenting cell is also controlled by the actin cytoskeleton, which make the applied load not a simple function of the separation between the two cells. An explanation about this was added in line 300–303. Detailed comparison between constant extension vs. constant force simulations is definitely a subject of our future study.

      Regarding line 243–245 of the original submission (line 297–298 of the revised manuscript), we agree with the reviewer that without further tests, lower forces at larger extensions per se cannot be an indicator that B7 forms a weaker catch bond. But with additional information, one can see it does have relevance to the catch bond strength. In addition to fewer TCR-pMHC contacts (Figure 1C of our manuscript), the intra-TCR contacts are also reduced compared to those of A6 (bottom panel of Figure 1D vs. Chang-Gonzalez et al., eLife, 2024, Figure 8A,B, first column). Based on these data, we calculated the average total intra-TCR contact occupancies in the 500–1000-ns interval, which was 30.4±0.49 (average±std) for B7 and 38.7±0.87 for A6. This result shows that the B7 TCR forms a looser complex with pMHC compared to A6. Also, B7<sup>low</sup> and B7<sup>high</sup> differ in extension by 16.3 ˚A while A6<sup>low</sup> and A6<sup>high</sup> differ by 5.1 ˚A, for similar ∼5-pN difference between low- and high-load cases. With the higher compliance of B7, it would be more difficult to achieve load-induced stabilization of the TCR-pMHC interface, hence a weaker catch bond. We explained this in line 129–132 and line 292–297.

      (3) On a related note, the authors do not refer to or consider other works using MD to study force-stabilized interactions (e.g. for catch bonding systems), e.g. these cases where constant force is applied and enhanced sampling techniques are used to assess the impact of that applied force: https://www.cell.com/biophysj/fulltext/S0006-3495(23)00341-7, https://www.biorxiv.org/content/10.1101/2024.10.10.617580v1. I was also surprised not to see this paper on catch bonding in pMHC-TCR referred to, which also includes some MD simulations: https://www.nature.com/articles/s41467-023-38267-1

      R2-d. We thank the reviewer for bringing the three papers to our attention, which are:

      (1) Languin-Catto¨en, Sterpone, and Stirnemann, Biophys. J. 122:2744 (2023): About bacterial adhesion protein FimH.

      (2) Pen˜a Ccoa, et al., bioRxiv (2024): About actin binding protein vinculin.

      (3) Choi et al., Nat. Comm. 14:2616 (2023): About a mathematical model of the TCR catch bond.

      Catch bond mechanisms of FimH and vinculin are different from that of TCR in that FimH and vinculin have relatively well-defined weak- and strong-binding states where there are corresponding crystal structures. Availability of the end-state structures permits simulation approaches such as enhanced sampling of individual states and studying the transition between the two states. In contrast, TCR does not have any structurally well-defined weak- or strong-binding states, which requires a different approach. As demonstrated in our current manuscript as well as in our previous two papers (Hwang et al., PNAS 2020 and Chang-Gonzalez et al., eLife, 2024), our microsecond-long simulations of the complex under realistic pN-level loads and a combination of analysis methods are effective for elucidating the catch bond mechanism of TCR. These are explained in line 227–238 of the manuscript.

      The third paper (Choi, et al., 2023) proposes a mathematical model to analyze extensive sets of data, and also perform new experiments and additional simulations. Of note, their model assumptions are based mainly on the steered MD (SMD) simulation in their previous paper (Wu, et al., Mol. Cell. 73:1015, 2019). In their model, formation of a catch bond (called catch-slip bond in Choi’s paper) requires partial unfolding of MHC and tilting of the TCR-pMHC interface. Our mechanism does not conflict with their assumptions since the complex in the fully folded state should first bear load in a ligand-dependent manner in order to allow any larger-scale changes. This is explained in line 239–243.

      For the revised text mentioned above (line 227–243), in addition to the 3 papers that the reviewer pointed out, we cited the following papers:

      • Thomas, et al., Annu. Rev. Biophys. 2008: Catch bond mechanisms in general.

      • Bakolitsa et al., Cell 1999, Le Trong et al., Cell 2010, Sauer et al., Nat. Comm. 2016, Mei et al., eLife 2020:

      Crystal structures of FimH and vinculin in different states.

      • Wu, et al., Mol. Cell. 73:1015, 2019: The SMD simulation paper mentioned above.

      (4) The authors should make at least the input files for their system available in a public place (github, zenodo) so that the systems are a more useful reference system as mentioned above. The authors do not have a data availability statement, which I believe is required.

      R2-d. As mentioned in R2-a above, we have added a Github repository containing sample simulation systems and scripts for the B7 TCR.

      Reviewer 3:

      Summary:

      The paper by Chang-Gonzalez et al. is a molecular dynamics (MD) simulation study of the dynamic recognition (load-induced catch bond) by the T cell receptor (TCR) of the complex of peptide antigen (p) and the major histocompatibility complex (pMHC) protein. The methods and simulation protocols are essentially identical to those employed in a previous study by the same group (Chang-Gonzalez et al., eLife 2024). In the current manuscript, the authors compare the binding of the same pMHC to two different TCRs, B7 and A6 which was investigated in the previous paper. While the binding is more stable for both TCRs under load (of about 10-15 pN) than in the absence of load, the main difference is that, with the current MD sampling, B7 shows a smaller amount of stable contacts with the pMHC than A6.

      Strengths:

      The topic is interesting because of the (potential) relevance of mechanosensing in biological processes including cellular immunology.

      Weaknesses:

      The study is incomplete because the claims are based on a single 1000-ns simulation at each value of the load and thus some of the results might be marred by insufficient sampling, i.e., statistical error. After the first 600 ns, the higher load of B7<sup>high</sup> than B7<sup>low</sup> is due mainly to the simulation segment from about 900 ns to 1000 ns (Figure 1D). Thus, the difference in the average value of the load is within their standard deviation (9 +/- 4 pN for B7<sup>low</sup> and 14.5 +/- 7.2 for B7<sup>high</sup>, Table 1). Even more strikingly, Figure 3E shows a lack of convergence in the time series of the distance between the V-module and pMHC, particularly for B7<sup>0</sup> (left panel, yellow) and B7<sup>low</sup> (right panel, orange). More and longer simulations are required to obtain a statistically relevant sampling of the relative position and orientation of the V-module and pMHC.

      R3-a. The reviewer uses data points during the last 100 ns to raise an issue with sampling. But since we are using realistic pN range forces, force fluctuates more slowly. In fact, in our simulation of B7<sup>high</sup>, while the force peaks near 35 pN at 500 ns (Figure 1D of our manuscript), the interfacial contacts show no noticeable changes around 500 ns (Figure 2B and Figure 2–figure supplement 1C of our manuscript). Similarly slow fluctuation of force was also observed for A6 TCR (Figure 8 of Chang-Gonzalez et al., eLife (2024)). Thus, a wider time window must be considered rather than focusing on forces in the last 100-ns interval.

      To compare fluctuation in forces, we added Figure 1–figure supplement 2, which is based on Appendix 3–Figure 1 of our A6 paper. It shows the standard deviation in force versus the average force during 500–1000 ns interval for various simulations in both A6 (open black circles) and B7 (red squares) systems. Except for Y8A<sup>low</sup> and dFG<sup>low</sup> of A6 (explained below), the data points lie on nearly a straight line.

      Thermodynamically, the force and position of the restraint (blue spheres in Figure 1A of our manuscript) form a pair of generalized force and the corresponding spatial variable in equilibrium at temperature 300 K, which is akin to the pressure P and volume V of an ideal gas. If V is fixed, P fluctuates. Denoting the average and std of pressure as ⟨P⟩ and ∆P, respectively, Burgess showed that ∆P/P⟩ is a constant (Eq. 5 of Burgess, Phys. Lett. A, 44:37; 1973). In the case of the TCRαβ-pMHC system, although individual atoms are not ideal gases, since their motion leads to the fluctuation in force on the restraints, the situation is analogous to the case where pressure arises from individual ideal gas molecules hitting the confining wall as the restraint. Thus, the near-linear behavior in the figure above is a consequence of the system being many-bodied and at constant temperature. The linearity is also an indicator that sampling of force was reasonable in the 500–1000-ns interval. The fact that A6 and B7 data show a common linear profile further demonstrates the consistency in our force measurement. About the two outliers of A6, Y8A<sup>low</sup> is for an antagonist peptide and dFG<sup>low</sup> is the Cβ FG-loop deletion mutant. Both cases had reduced numbers of contacts with pMHC, which likely caused a wider conformational motion, hence greater fluctuation in force.

      Upon suggestion by the reviewer, we extended the simulations of B7<sup>0</sup>, B7<sup>low</sup> and B7<sup>high</sup> to about 1500 ns (Table 1). While B7<sup>0</sup> and B7<sup>low</sup> behaved similarly, B7<sup>high</sup> started to lose contacts at around 1300 ns (top panel of Figure 1D and Figure 2B). A closer inspection revealed that destabilization occurred when the complex reached low-force states. Even before 1300 ns, at about 750 ns, the force on B7<sup>high</sup> drops below 5 pN, and another drop in force occurred at around 1250 ns, though to a lesser extent (Figure 1D). These changes are followed by increase in the Hamming distance (Figure 2B). Thus, in B7<sup>high</sup>, destabilization is caused not by a high force, but by a lack of force, which is consistent with the overarching theme of our work, the load-induced stabilization of the TCRαβ-pMHC complex.

      The destabilization of B7<sup>high</sup> during our simulation is a combined effect of its overall weaker interface compared to A6 (despite having comparable number of contacts in crystal structures; line 265–269), and its high compliance (explained in the second paragraph of our response R2-c above). Under a fixed extension, the higher compliance of the complex can reach a low-force state where breakage of contacts can happen. In reality, with an approximately constant spacing between a T cell and an antigen-presenting cell, force is also regulated by the actin cytoskeleton (explained in the first paragraph of R2-c above). While detailed comparison between constant-extension and constant-force simulation is the subject of a future study, for this manuscript, we used the 500–1000-ns interval for calculating time-averaged quantities, for consistency across different simulations. For time-dependent behaviors, we showed the full simulation trajectories, which are Figure 1D, Figure 2B, Figure 2–figure supplement 1 (except for panel E), and Figure 4–figure supplement 1B.

      Thus, rather than performing replicate simulations, we perform multiple simulations under different conditions and analyze them from different angles to obtain a consistent picture. If one were interested in quantitative details under a given condition, e.g., dynamics of contacts for a given extension or the time when destabilization occurs at a given force, replicate simulations would be necessary. However, our main conclusions such as load-induced stabilization of the interface through the asymmetric motion, and B7 forming a weaker complex compared to A6, can be drawn from our extensive analysis across multiple simulations. Please also note that reviewer 1 mentioned that our conclusions are “generally well supported by data.”

      A similar argument applies to Figure 2–figure supplement 1F (old Figure 3B that the reviewer pointed out). If precise values of the V-module to pMHC distance were needed, replicate simulations would be necessary, however, the figure demonstrates that B7<sup>high</sup> maintains more stable interface before the disruption at 1300 ns compared to B7<sup>low</sup>, which is consistent with all other measures of interfacial stability we used. The above points are explained throughout our updated manuscript, including

      • Line 106–110, 125–132, 156–158, 298–303.

      • Figures showing time-dependent behaviors have been updated and Figure 1–figure supplement 2 has been added, as explained above.

      It is not clear why ”a 10 A distance restraint between alphaT218 and betaA259 was applied” (section MD simulation protocol, page 9).

      R3-b. αT218 and β_A259 are the residues attached to a leucine-zipper handle in _in vitro optical trap experiments (Das, et al., PNAS 2015). In T cells, those residues also connect to transmembrane helices. Our newly added Figure 1–figure supplement 1 shows a model of N15 TCR used in experiments in Das’ paper, constructed based on PDB 1NFD. Blue spheres represent C<sub>α</sub> atoms corresponding to αT218 and βA259 of B7 TCR. Their distance is 6.7 ˚A. The 10-˚A distance restraint in simulation was applied to mimic the presence of the leucine zipper that prevents excessive separation of the added strands. The distance restraint is a flatbottom harmonic potential which is activated only when the distance between the two atoms exceeds 10 ˚A, which we did not clarify in our original manuscript. It is now explained in line 371–373. The same restraint was used in our previous studies on JM22 and A6 TCRs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Clarify the reason for including arguably non-physiological simulations, in which the C domain is missing. Is the overall point that it is essential for proper peptide discrimination?

      R1-c. This is somewhat a philosophical question. Rather than recapitulating experiment, we believe the goal of simulation is to gain insight. Hence, a model should be justified by its utility rather than its direct physiological relevance. The system lacking the C-module is useful since it informs about the allosteric role of the C-module by comparing its behavior with that of the full TCRαβ-pMHC complex. The increased interfacial stability of Vαβ-pMHC is also consistent with our discovery that the C-module likely undergoes a partial unfolding to an extended state, where the bond lifetime increases (Das, et al., PNAS 2015; Akitsu et al., Sci. Adv., 2024). In this sense, Vαβ-pMHC has a more direct physiological relevance. Furthermore, considering single-chain versions of an antibody lacking the C-module (scFv) are in widespread use (Ahmad et al., J. Immunol. Res., 2012) including CAR T cells, a better understanding of a TCR lacking the C-module may help with developing a novel TCR-based immunotherapy. These explanations have been added in line 253–261.

      (2) Suggest changing Vαβ-pMHC to B7<sup>0</sup>∆C to emphasize that the constant domain is deleted.

      R1-d. While we appreciate the reviewer’s suggestion, the notation Vαβ-pMHC was used in our previous two papers (Hwang, PNAS 2020, Chang-Gonzalez, eLife 2024). We thus prefer to keep the existing notation.

      (3) Suggest adding A6 data to table 1 for comparison, making it clear if it is from a previous paper.

      R1-e. Table 1 of the present manuscript and Table 1 of the A6 paper differ in items displayed. Instead of merging, we added the extension and force for A6 corresponding to B7<sup>low</sup> and B7<sup>high</sup> in the caption of Table 1.

      (4) Suggest discussing the catch-bond behavior in terms of departure from equilibrium, e.g. is it possible to distinguish between different (catch vs slip) bond behaviors on the basis of work of separation histograms? If the difference does not show up in equilibrium work, the exponential work averages would be similar, but work histograms could be very different.

      R1-f. Although energetics of the catch versus slip bond will provide additional insight, it is beyond the scope of the present simulations that do not involve dissociation events nor simulations of slip-bond receptors. We instead briefly mention the energetic aspect in terms of T-cell activation in line 316–319.

      (5) Have the simulations in Figure 1 reached steady state? The force and occupancy increase almost linearly up until 500ns, then seem to decrease rather dramatically by 750ns. It might be worthwhile to extend one simulation to check.

      R1-g. We did extend the simulation to about 1500 ns. The large and slow fluctuation in force is an inherent property of the system, as explained in R3-a above.

      (6) Is the loss of contacts for B7<sup>0</sup> due to thermalization and relaxation away from the X-ray structure?

      R1-h. The initial thermalization at 300 K is not responsible for the loss of contacts for B7<sup>0</sup> since we applied distance restraints to the initial contacts to keep them from breaking during the preparatory runs (line 358–370). While ‘relaxation away from the X-ray structure’ gives an impression that the complex approaches an equilibrium conformation in the absence of the crystallographic confinement, our simulation indicates that the stability of the complex depends on the applied load. We made the distinction between relaxation and the load-dependent stability clearer in line 233–238.

      (7) Figure 4 contains a very large amount of data. Could it be simplified and partly moved to SI? For example, panel G is somewhat hard to read at this scale, and seems non-essential to the general reader.

      R1-i. Upon the reviewer’s suggestion, we simplified Figure 4 by moving some of the panels to Figure 4–figure supplement 1. Panels have also been made larger for better readability.

      (8) If the coupling between C and V domains is necessary for catch-bond behavior, can one propose mutations that would disrupt the interface to test by experiment? This would be interesting in light of the authors’ own comment on p. 8 that ’a logical evolutionary pressure would be for the C domains to maximize discriminatory power by adding instability to the TCR chassis,’ which might lead to a verifiable hypothesis.

      R1-j. This has already been computationally and experimentally tested for other TCRs by the Cβ FG-loop deletion mutants that diminish the catch bond (Das, et al., PNAS 2015; Hwang et al., PNAS 2020; ChangGonzalez et al., eLife, 2024). Furthermore, the Vγδ-Cαβ chimera where the C-module of TCRγδ is replaced by that of TCR_αβ_ that strengthens the V-C coupling achieved a gain-of-function catch bond character while the wild-type TCRγδ is a slip-bond receptor (Mallis, et al., PNAS 2021; Bettencourt et al., Biophys. J. 2024). We added our prediction that the FG-loop deletion mutants of B7 TCR will behave similarly in line 261–264.

      (9) Regarding extending TCR and MHC termini using native sequences, as described in the methods, what would be the disadvantage of using the same sequence, which could be made much more rigid, e.g. a poly-Pro sequence? After all, the point seems to be applying a roughly constant force, but flexible/disordered linkers seem likely to increase force fluctuation.

      R1-k. The purpose of adding linkers was to allow a certain degree of longitudinal and transverse motion as would occur in vivo. While it will be worthwhile to explore the effects of linker flexibility on the conformational dynamics of the complex, for the present study, we used the actual sequence for the linkers for those proteins (line 341–344).

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2 is almost illegible, especially Figure 2A-D. I do not think that these contacts vs time would be useful to anyone except for someone interested in this particular pMHC interaction, so I would suggest moving it to a supporting figure and making it much larger.

      R2-e. Thanks for the suggestion. We created Figure 2–figure supplement 1 and made panels larger for clearer presentation.

      (2) Figure 4 is overwhelming, and does not convey any particular message.

      R2-f. This is the same comment as reviewer 1’s comment (7) above. Please see our response R1-i.

      Reviewer #3 (Recommendations for the authors):

      (1) The label ”beta2m” in Figure 1A should be moved closer to the beta2 microglobulin domain. A label TCR should be added to Figure 1A.

      R3-c. Thanks for pointing out about β2m. We have corrected it. About putting the label ‘TCR,’ to avoid cluttering, we explained that Vα, Vβ, Cα, and Cβ are the 4 subdomains of TCR in the caption of Figure 1A.

      (2) Hydrogen atoms should be removed from the peptide in Figure 1B.

      R3-d. We have removed the hydrogen atoms.

      (3) The authors should consider moving Figures 1 A-D to the SI and show a simpler description of the contact occupancy than the heat maps. The legend of Figure 2A-D is too small.

      R3-e. By ‘Figures 1 A-D’ we believe the reviewer meant Figure 2A–D. This is the same comment as reviewer 2’s comment (1). Please see our response R2-e above.

      (4) Vertical (dashed) lines should be added to Figure 3E at 500 ns to emphasize the segment of the time series used for the histograms.

      R3-f. We added vertical lines in figures showing time-dependent behaviors, which are Figure 1D, Figure 2B, Figure 2–figure supplement 1F, and Figure 4–figure supplement 1B.

    1. Author response:

      Reviewer #1 (Public Review):

      We are grateful to this reviewer for her/his constructive comments, which have greatly improved our work. Individual responses are provided below.

      The authors recorded from multiple mossy cells (MCs) of the dentate gyrus in slices or in vivo using anesthesia. They recorded MC spontaneous activity during spontaneous sharp waves (SWs) detected in area CA3 (in vitro) or in CA1 ( in vivo). They find variability of the depolarization of MCs in response to a SW. They then used deep learning to parse out more information. They conclude that CA3 sends different "information" to different MCs. However, this is not surprising because different CA3 neurons project to different MCs and it was not determined if every SW reflected the same or different subsets of CA3 activity.

      Thank you for your valuable comments. We agree that our finding that different MCs receive different information is unsurprising. These data are, in fact, to be expected from the anatomical knowledge of the circuit structure. However, as a physiological finding, there is a certain value in proving this fact; please note that it was not clear whether the neural activity of individual MCs received heterogeneous/variable information at the physiological level. It was therefore necessary to investigate this by recording neural activity. We believe this study is important because it quantitatively demonstrates this fact.

      The strengths include recording up to 5 MCs at a time. The major concerns are in the finding that there is variability. This seems logical, not surprising. Also it is not clear how deep learning could lead to the conclusion that CA3 sends different "information" to different MCs. It seems already known from the anatomy because CA3 neurons have diverse axons so they do not converge on only one or a few MCs. Instead they project to different MCs. Even if they would, there are different numbers of boutons and different placement of boutons on the MC dendrites, leading to different effects on MCs. There also is a complex circuitry that is not taken into account in the discussion or in the model used for deep learning. CA3 does not only project to MCs. It also projects to hilar and other dentate gyrus GABAergic neurons which have complex connections to each other, MCs, and CA3. Furthermore, MCs project to MCs, the GABAergic neurons, and CA3. Therefore at any one time that a SW occurs, a very complex circuitry is affected and this could have very different effects on MCs so they would vary in response to the SW. This is further complicated by use of slices where different parts of the circuit are transected from slice to slice.

      The first half of this paragraph is closely related to the previous paragraph. We propose that the variation in membrane potential of the simultaneously recorded MCs allows for the expression of diverse information. We also believe that this is highly novel in that no previous work has described the extent to which SWR is encoded in MCs. Our study proposes a new quantitative method that relates two variables (LFP and membrane potential) that are inherently incomparable. Specifically, we used machine learning (please note that it is a neural network, but not "deep learning") to achieve this quantification, and we believe this innovation is noteworthy.

      In the latter part of this article, you raise another important point. First, we would like to point out that this comment contains a slight misunderstanding. Our goal is not to reproduce the circuit structure of the hippocampus in silico but to propose a "function (or mapping/transformation)" that connects the two different modalities, i.e., LFP and Vm. This function should be as simple as possible, which is desirable from an explanatory point of view. In this respect, our machine learning model is a 'perceptron'-like 3-layer neural network. One of the simplest classical neural network models can predict the LFP waveform from Vm, which is quite surprising and an achievement we did not even imagine before. The fact that our model does not consider dendrites or inhibitory neurons is not a drawback but an important advantage. On the other hand, the fact that the data we used for our predictions were primarily obtained using slice experiments may be a drawback of this study, and we agree with your comments. However, we can argue that the new quantitative method we propose here is versatile since we showed that the same machine learning can be used to predict in vivo single-cell data.

      It is also not discussed if SWs have a uniform frequency during the recording session. If they cluster, or if MC action potentials occur just before a SW, or other neurons discharge before, it will affect the response of the MC to the SW. If MC membrane potential varies, this will also effect the depolarization in response to the SW.

      Thank you for raising an important point. We have done some additional analyses in response to your comment. First, we plotted how the SWR parameter fluctuated during our recording time (especially for data recorded for long periods of more than 5 minutes). As shown in the new Figure 1 - figure supplement 4, we can see that the frequency of SWRs was kept uniform during the recording time. These data ensure the rationale for pooling data over time.

      We also calculated the average membrane potentials of MCs before and after SWRs and found that MCs did not show depolarization or hyperpolarization before SWs, unlike Vm of CA1 neurons. These data indicate that the surrounding circuitry was not particularly active before SW, eliminating any concern that such unexpected preceding activity might affect our analysis. These data are shown in Figure 1 - figure supplement 2.

      In vivo, the SWs may be quite different than in vivo but this is not discussed. The circuitry is quite different from in vitro. The effects of urethane could have many confounding influences. Furthermore, how much the in vitro and in vivo SWs tell us about SWs in awake behaving mice is unclear.

      We agree with this point. Ideally, recording in vitro and in vivo under conditions as similar as possible would be optimal. However, as you know, patch-clamp recording from mossy cells in vivo is technically challenging, and currently, there is no alternative to conducting experiments under anesthesia. We believe that science advances not merely through theoretical discourse, but by contributing empirical data collected under existing conditions. However, as we mentioned in the paper, we believe that in vivo and in vitro SWR share some properties and a common principle of occurrence. We also observed that there are similar characteristics in the membrane potential response of MC to SWR. However, as you have pointed out, data derived from these limitations require careful interpretation, and we have explicitly stated in the paper that not only are there such problems, but that there are also common properties in the data obtained in vivo and in vitro (Page 12, Line 357).

      Also, methods and figures are hard to understand as described below.

      Thank you for all your comments. We have carefully considered the reviewers' comments and improved the text and legend. We hope you will take the time to review them.

      Reviewer #2 (Public Review):

      Thank you for the positive evaluations, which have encouraged us to resubmit this manuscript. We have revised our manuscript in accordance with your comments. Our point-by-point responses are as follows:

      • A summary of what the authors were trying to achieve

      Drawing from theoretical insights on the pivotal role of mossy cells (MCs) in pattern separation - a key process in distinguishing between similar memories or inputs - the authors investigated how MCs in the dentate gyrus of the hippocampus encode and process complex neural information. By recording from up to five MCs simultaneously, they focused on membrane potential dynamics linked to sharp wave-ripple complexes (SWRs) originating from the CA3 area. Indeed, using a machine learning approach, they were able to demonstrate that even a single MC's synaptic input can predict a significant portion (approximately 9%) of SWRs, and extrapolation suggested that synaptic input obtained from 27 MCs could account for 90% of the SWR patterns observed. The study further illuminates how individual MCs contribute to a distributed but highly specific encoding system. It demonstrates that SWR clusters associated with one MC seldom overlap with those of another, illustrating a precise and distributed encoding strategy across the MC network.

      We appreciate that this reviewer found scientific value in our manuscript. Thanks to the comments, we were pleased to be able to revise and improve the manuscript. Individual responses are listed below:

      • An account of the major strengths and weaknesses of the methods and results

      Strengths:

      (1) This study is remarkable because it establishes a critical link between the subthreshold activities of individual neurons and the collective dynamics of neuronal populations.

      (2) The authors utilize machine learning to bridge these levels of neuronal activity. They skillfully demonstrate the predictive power of membrane potential fluctuations for neuronal events at the population level and offer new insights into neuronal information processing.

      (3) To investigate sharp wave/ripple-related synaptic activity in mossy cells (MCs), the authors performed challenging experiments using whole-cell current-clamp recordings. These recordings were obtained from up to five neurons in vitro and from single mossy cells in live mice. The latter recordings are particularly valuable as they add to the limited published data on synaptic input to MCs during in vivo ripples.

      We appreciate the reviewer’s critical evaluations, which have encouraged us to revise and resubmit this manuscript. We have revised our manuscript in line with the reviewer’s comments. Our point-by-point responses are provided below:

      Weaknesses:

      (1) The model description could significantly benefit from additional details regarding its architecture, training, and evaluation processes. Providing these details would enhance the paper's transparency, facilitate replication, and strengthen the overall scientific contribution. For further details, please see below.

      Thank you for the suggestions. We have responded with model details based on the following comments.

      (2) The study recognizes the concept of pattern separation, a central process in hippocampal physiology for discriminating between similar inputs to form distinct memories. The authors refer to a theoretical paper by Myers and Scharfman (2011) that links pattern separation with activity backpropagating from CA3 to mossy cells. Despite this initial citation, the concept is not discussed again in the context of the new findings. Given the significant role of MCs in the dentate gyrus, where pattern separation is thought to occur, it would be valuable to understand the authors' perspective on how their findings might relate to or contribute to existing theories of pattern separation. Could the observed functions of MCs elucidated in this study provide new insights into their contribution to processes underlying pattern separation?

      Thank you for your valuable comment. The role of MCs in pattern separation is described in the discussion as follows:

      “It has been shown through theoretical models that MCs are a contributor to pattern separation (Myers and Scharfman, 2011). In general, the pathway of neural information is diverged from the entorhinal cortex through the larger granule cell layer and then compressed into the smaller CA3 cell layer. In this case, there is a high possibility of information loss during the transmission process. Thus, a backprojection mechanism via MCs has been proposed as a device to prevent information loss. Indeed, in theoretical models, such backprojection improves pattern separation and memory capacity, and the results are closer to experimental data than models without built-in backprojection. However, it was unclear what information individual MCs receive during backprojection. Our results show that CA3 SWR is distributed and encoded in the MC population, and that even though the number of MCs is smaller than in other regions, it is possible to reproduce about 30% of the SWR in CA3 from the membrane potential of only five MCs. Based on these results, it is believed that MCs not only play a role in preventing information loss, but also play a role in receiving some kind of newly encoded memory information in the CA3 region, and it is highly likely that the information contained in the backprojections is different from the neural information transmitted through conventional transmission pathways. Indeed, the fact that the information replayed in CA3 is reflected as SWR and propagated to each brain region suggests that the newly encoded memory information in CA3 is propagated to MC. If  backprojection simply returned the information transmitted from DG to CA3, and to MC, this would be unrealistic and extremely inefficient. However, it is still unclear what kind of memory information is actually backprojected and distributed to the MC, and how it differs from the memory information transmitted in the forward direction. These are open questions that need to be addressed in future experiments in awake animals.” (Page 11, Line 333)

      (3) Previous work concluded that sharp waves are associated with mossy cell inhibition, as evidenced by a consistent ripple function-related hyperpolarization of the membrane potential in these neurons when recorded at resting membrane potential (Henze & Buzsáki, 2007). In contrast, the present study reveals an SWR-induced depolarization of the membrane potential. Can the authors explain the observed modulation of the membrane potential during CA1 ripples in more detail? What was the proportion of cases of depolarization or hyperpolarization? What were the respective amplitude distributions? Were there cases of activation of the MCs, i.e., spiking associated with the ripple? This more comprehensive information would add significance to the study as it is not currently available in the literature.

      Sorry for confusing the conclusion. First, we did not mention in the paper that in vivo MC depolarized during SWR. The following sentences have added to result:

      “Previous research has shown that the hyperpolarization of MC membrane potential associated with SWR indicates that SWR is related to the inhibition of mossy cells (Henze and Buzsáki, 2007). However, our data showed that the proportion of cases of depolarization or hyperpolarization was about the same, with a slight excess of depolarization. However, it should be noted that MCs are highly active and fluctuating cells, and the determination of whether they are depolarized or hyperpolarized is highly dependent on the method of analysis. Moreover, the firing rate of MCs that we recorded was 1.07 ± 0.93 Hz (mean ± SD from 6 cells, 6 mice), and 6.68 ± 4.79% (mean ± SD from 6 cells, 6 mice, n = 757 SWR events) of all SWRs recruited MC firing (calculated as firing within 50 ms after the SWR peak). ” (Page 5, Line 143)

      (4) In the study, the observation that mossy cells (MCs) in the lower (infrapyramidal) blade of the dentate gyrus (DG) show higher predictability in SWR patterns is both intriguing and notable. This finding, however, appears to be mentioned without subsequent in-depth exploration or discussion. One wonders if this observed predictability might be influenced by potential disruptions or severed connections inherent to the brain slice preparation method used. Furthermore, it prompts the question of whether similar observations or trends have been noted in MCs recorded in vivo, which could either corroborate or challenge this intriguing in vitro finding.

      As you pointed out, one cannot rule out the possibility that this predictability may be influenced by potential disruptions or disconnections inherent in the methods used to prepare the acute slices. And the number of cells is limited to six with respect to the anatomical location of the MC recorded in vivo, making SWR and MC patch clamp recording very difficult even under anesthesia. Therefore, it is difficult to find statistical significance in the current data. We have added following text in Discussion:

      “In addition, the finding that SWR is more predictive when the recorded location of the MC is near the lower blade of the DG is unexpected, so the possibility that this result is influenced by potential disruptions or severed connections during the preparation of the acute slice cannot be ruled out.” (Page 14, Line 405)

      (5) The study's comparison of SWR predictability by mossy cells (MCs) is complicated by using different recording sites: CA3 for in vitro and CA1 for in vivo experiments, as shown in Fig. 2. Since CA1-SWRs can also arise from regions other than CA3 (see e.g. Oliva et al., 2016, Yamamoto and Tonegawa, 2017), it is difficult to reconcile in vitro and in vivo results. Addressing this difference and its implications for MC predictability in the results discussion would strengthen the study.

      Thank you for your comment. We have added the following discussion to your comment:

      “In this study, we performed MC patch-clamp recording both in vivo and in vitro, and clarified that SWR can be predicted from V_m of MC in both cases. However, there are three caveats to the interpretation of these data. First, the _in vivo SWR cannot be said to be exactly the same as the in vitro SWR: note that in vitro SWR has some similarities to in vivo SWR, such as spatial and spectral profiles and neural activity patterns (Maier et al., 2009; Hájos et al., 2013; Pangalos et al., 2013). The same concern applies to MC synaptic inputs. The in vivo V_m data may contain more information compared to the _in vitro single MC data, because the entire projections that target MCs are intact, resulting in a complete set of synaptic inputs related to SWR activity, as opposed to slices where connections are severed. While we recognize these differences, it is also very likely that there are common ways of expressing information. Second, since the in vivo LFP recordings were obtained from the CA1 region, it is possible that the CA1-SWR receives input from the CA2 region (Oliva et al., 2016) and the entorhinal cortex (Yamamoto and Tonegawa, 2017). In addition, urethane anesthesia has been observed to reduce subthreshold activity, spike synchronization, and SWR (Yagishita et al., 2020), making it difficult to achieve complete agreement with in vitro SWR recorded from the CA3 region. Finally, although we were able to record MC V_m during _in vivo SWR in this study, the in vivo data set consisted of recordings from a single MC, in contrast to the in vitro dataset. To perform the same analysis as in the in vitro experiment, it would be desirable to record LFPs from the CA3 region and collect data from multiple MCs simultaneously, but this is technically very difficult. In this study, it was difficult to directly clarify the consistency between CA3 network activity and in vivo MC synaptic input, but the fact that the SWR waveform can be predicted from in vivo MC V_m in CA1-SWR may be the result of some CA3 network activity being reflected in CA1-SWR. It is undeniable that more accurate predictions would have been possible if it had been possible to record LFP from the CA3 regions _in vivo. ” (Page 12, Line 357)

      • An appraisal of whether the authors achieved their aims, and whether the results support their conclusions

      As outlined in the abstract and introduction, the primary aim is to investigate the role of MCs in encoding neuronal information during sharp wave ripple complexes, a crucial neuronal process involved in memory consolidation and information transmission in the hippocampus. It is clear from the comprehensive details in this study that the authors have meticulously pursued their goals by providing extensive experimental evidence and utilizing innovative machine learning techniques to investigate the encoding of information in the hippocampus by mossy cells (MCs). Together, this study provides a compelling account supported by rigorous experimental and analytical methods. Linking subthreshold membrane potentials and population activity by machine learning provides a comprehensive new analytic approach and sheds new light on the role of MCs in information processing in the hippocampus. The study not only achieves the stated goals, but also provides novel methodology, and valuable insights into the dynamics of neural coding and information flow in the hippocampus.

      We appreciate the reviewer’s critical evaluations, which have encouraged us to revise and resubmit this manuscript. We have revised our manuscript in line with the reviewer’s comments.

      • A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community

      Impact: Both the novel methodology and the provided biological insights will be of great interest to the community.

      Utility of methods/data: The applied deep learning approach will be of particular interest if the authors provide more details to improve its reproducibility (see related suggestions below).

      We appreciate that this reviewer found scientific value in our manuscript. Thanks to the comments.

      Reviewer #3 (Public Review):

      We appreciate that this reviewer raised several important issues. We are pleased to have been able to revise the paper into a better manuscript based on these comments. Individual responses are listed below:

      Compared to the pyramidal cells of the CA1 and CA3 regions of the hippocampus, and the granule cells of the dentate gyrus (DG), the computational role(s) of mossy cells of the DG have received much less attention over the years and are consequently not well understood. Mossy cells receive feedforward input from granule cells and feedback from CA3 cells. One significant factor is the compression of the large number of CA3 cells that input onto a much smaller population of mossy cells, which then send feedback connections to the granule cell layer. The present paper seeks to understand this compression in terms of neural coding, and asks whether the subthreshold activity of a small number of mossy cells can predict above chance levels the shapes of individual SWs produced by the CA3 cells. Using elegant multielectrode intracellular recordings of mossy cells, the authors use deep learning networks to show that they can train the network to "predict" the shape of a SW that preceded the intracellular activity of the mossy cells. Putatively, a single mossy cell can predict the shape of SWs above chance. These results are interesting, but there are some conceptual issues and questions about the statistical tests that must be addressed before the results can be considered convincing.

      We appreciate that this reviewer found scientific value in our manuscript. Thanks to the comments, we were pleased to be able to revise and improve the manuscript. Individual responses are listed below:

      Strengths

      (1) The paper uses technically challenging techniques to record from multiple mossy cells at the same time, while also recording SWs from the LFP of the CA3 layer. The data appear to be collected carefully and analyzed thoughtfully.

      (2) The question of how mossy cells process feedback input from CA3 is important to understand the role of this feedback pathway in hippocampal processing.

      3) Given the concerns expressed below about proper statistical testing are resolved, the data appear supportive of the main conclusions of the authors and suggest that, to some degree, the much smaller population of mossy cells can conserve the information present in the larger population of CA3 cells, presumably by using a more compressed, dense population code.

      We appreciate the reviewer’s critical evaluations, which have encouraged us to revise and resubmit this manuscript. We have revised our manuscript in line with the reviewer’s comments. Our point-by-point responses are provided below:

      Weaknesses

      4) Some of the statistical tests appear inappropriate because they treat each CA3 SW and associated Vm from a mossy cell as independent samples. This violates the assumptions of statistical tests such as the Kolmogorov-Smirnov tests of Figure 3C and Fig 3E. Although there is large variability among the SWs recorded and among the Vm's, they cannot be considered independent measurements if they derive from the same cell and same recording site of an individual animal. This becomes especially problematic when the number of dependent samples adds up to the tens of thousands, providing highly inflated numbers of samples that artificially reduce the p values. Techniques such as mixed-effects models are being increasingly used to factor out the effects of within cell and within animal correlations in the data. The authors need to do something similar to factor out these contributions in order to perform statistical tests, throughout the manuscript when this problem occurs.

      Thank you for the insightful comment. As for the correlation between the animals, since they were brought in at the same age and kept in the same environment, we do not think it is necessary to account for the differences due to environmental factors. As the reviewer pointed out, we cannot completely rule out the possibility that within cell or within animal correlation might influence the results, so we plotted the differences in prediction accuracy between cells, slices, and animals (Figure 3 - figure supplement 7). The results showed that prediction accuracy of the real data was better than that of the shuffled data in 66 of the 87 MCs (75.9%). In response to the comment that measurements from the same animal do not constitute independent samples, we have indicated that the average ΔRMSE for each mouse were calculated and these values were significantly different from 0 (n = 14, *p = 0.0041, Student’s t-test). In other words, even if each animal is considered an independent sample, it is possible to obtain statistically significant differences.

      5) A separate statistical problem occurs when comparing real data against a shuffled, surrogate data set. From the methods, I gather that Figure 3C combined data from 100 surrogate shuffles to compare to the real data. It is inappropriate to do a classic statistical test of data against such shuffles, because the number of points in the pooled surrogate data sets are not true samples from a population. It is a mathematical certainty that one can eventually drive a p value to < 0.05 just by increasing the number of shuffles sufficiently. Thus, the p value is determined by the number of computer shuffles allowed by the time and processing power of a computer, rather than by sampling real data from the population. Figures such as 4C and 5A are examples that test data against shuffle appropriately, as a single value is determined to be within or outside the 95% confidence interval of the shuffle, and this determination is not directly affected by the number of shuffles performed.

      Thank you for raising a very good point. We understand the reviewer's comments, but we cannot fully agree with the part that says "It is mathematical certainty that one can eventually drive a p value to < 0.05 just by increasing the number of shuffles sufficiently". This is because when comparing data with no difference at all, no amount of shuffling will produce a significant difference. In this regard, we agree that increasing the number of shuffles will lower the p-value when comparing data with even a small difference. Based on the reviewer's comments, we used a paired t-test to test whether the difference between RMSEreal and RMSEsurrogate was significantly different from 0, and showed it was significantly different (Figure 3 - figure supplement 5). Even when a paired t-test was used for the test, as in Figure 3E, a significant difference in the prediction error of the real and shuffled data was observed for all MC number inputs and also for the in vivo data.

      6) The last line of the Discussion states that this study provides "important insights into the information processing of neural circuits at the bottleneck layer," but it is not clear what these insights are. If the statistical problems are addressed appropriately, then the results do demonstrate that the information that is reflected in SWs can be reconstructed by cells in the MC bottleneck, but it is not certain what conceptual insights the authors have in mind. They should discuss more how these results further our understanding of the function of the feedback connection from CA3 to the mossy cells, discuss any limitations on their interpretation from recording LFPs rather than the single-unit ensemble activity (where the information is really encoded).

      Thank you for your insightful comment. We have added the following text to the discussion:

      “Given that different SWRs may encode information that correlates with different experiences, it is also possible that the activity of individual MCs may play a role in encoding different experiences via SWRs. Indeed, several in vivo studies have confirmed that MC activity is involved in the space encoding (Bui et al., 2018; Huang et al., 2024). However, the relationship with SWRs has not been investigated. The significance of the fact that the SWR recorded from CA3 is reflected in the MC as synaptic input is that it not only shows the transmission pathway from CA3 to MC, but also reveals the information below the threshold that leads to firing, and in a broad sense, it approaches the mechanism by which information processing by neuronal firing. And the expression of synaptic input to the MC is not uniform, but varies in a variety of ways according to the pattern of SWR. Based on previous research showing that diversity is important for information representation (Padmanabhan and Urban, 2010; Tripathy et al., 2013), it is possible that this heterogeneity in membrane potential levels, rather than the all-or-none output of neuronal firing activity, is the key to encoding more precise information. In this respect, our research, which focuses on information encoding at the subthreshold level, may be able to extract even more information than information encoded by firing activity. ” (Page 14, Line 419)

      7) In Figure 1C, the maximum of the MC response on the first inset precedes the SW, and the onset of the Vm response may be simultaneous with SW. This would suggest that the SW did not drive the mossy cell, but this was a coincident event. How many SW-mossy cell recordings are like this? Do the authors have a technical reason to believe that these are events in which the mossy cell is driven by the CA3 cells active during the SW?

      Thank you for your insightful comment. Based on your comment, we have aligned all the MC EPSPs for each SWR onset and found that the EPSPs rise after the SWR onset (Figure 1 - figure supplement 2). This leads us to believe that the EPSP of the MC is most likely driven by the SWR.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript described a structure-guided approach to graft important antigenic loops of the neuraminidase to a homotypic but heterologous NA. This approach allows the generation of well-expressed and thermostable recombinant proteins with antigenic epitopes of choice to some extent. The loop-grafted NA was designated hybrid.

      Strengths:

      The hybrid NA appeared to be more structurally stable than the loop-donor protein while acquiring its antigenicity. This approach is of value when developing a subunit NA vaccine which is difficult to express. So that antigenic loops could be potentially grafted to a stable NA scaffold to transfer strain-specific antigenicity.

      Weaknesses:

      However, major revisions to better organize the text, and figure and make clarifications on a number of points, are needed. There are a few cases in which a later figure was described first, data in the figures were not sufficiently described, or where there were mismatched references to figures.

      More importantly, the hybrid proteins did not show any of the advantages over the loop-donor protein in the format of VLP vaccine in mouse studies, so it's not clear why such an approach is needed to begin with if the original protein is doing fine.

      We thank the reviewer for their helpful comments. We have incorporated feedback from the authors to improve the manuscript. Please see our point-by-point response.

      The purpose of loop-grafting between H5N1/2021 (a high-expressor) and the PR8 virus was not to improve the expression of PR8, which is already a good expressing NA. Instead, the loop-grafting and the in vivo experiments were done to show the loop-specific protection following a lethal PR8 virus challenge.

      Reviewer #2 (Public review):

      In their manuscript, Rijal and colleagues describe a 'loop grafting' strategy to enhance expression levels and stability of recombinant neuraminidase. The work is interesting and important, but there are several points that need the author's attention.

      Major points

      (1) The authors overstress the importance of the epitopes covered by the loops they use and play down the importance of antibodies binding to the side, the edges, or the underside of the NA. A number of papers describing those mAbs are also not included.

      We have discussed the distribution of epitopes on NA molecule in the Discussion section "The distribution of epitopes in neuraminidase" (new line number 350). In Supplementary Figures 1 and 2, we have compiled the epitopes reported by polyclonal sera and mAbs via escape virus selection or crystal structural studies. There are 45 residues examples of escape virus selection, and we found that approximately 90% of the epitopes are located within the top loops (Loops 01 and Loops 23, which include the lateral sides and edges of NA). We have also included the epitopes of underside mAbs NDS.1 and NDS.3 in Supplementary Figure 2. Some of the interactions formed by these mAbs are also within the L01 and L23 loops. All relevant references are cited in Supplementary Figures 1 and 2.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      (2) The rationale regarding the PR8 hybrid is not well described and should be described better.

      We described the rationale for the PR8 hybrid (new lines 247-250). For clarity, we have added the following sentence within the section "Loop transfer between two distant N1 NAs:...."

      (new lines 255-258):

      "mSN1 showed sufficient cross-reactivity to N1/09 to protect mice against virus challenge. Therefore, we performed loop transfer between mSN1 and PR8N1, which differ by 18 residues within the L01 and L23 loops and show no or minimal cross-reactivity, to assess the loop-specific protection."

      (3) Figure 3B and 6C: This should be given as numbers (quantified), not as '+'.

      We have included the numerical data in Supplementary Figure 6. The data is presented in semi-quantitative manner for simplification. To improve clarity, we have now added the following sentence to the Figure 3c legend: "Refer to Supplementary Figure 6 for binding titration data".

      (4) Figure 5A and 7A: Negative controls are missing.

      A pool of Empty VLP sera was included as a negative control, showing no inhibition at 1:40 dilution. In the figure legends, we have stated "Pooled sera to unconjugated mi3 VLP was negative control and showed no inhibition at 1:40 dilution (not included in the graphs)"

      (5) The authors claim that they generate stable tetramers. Judging from SDS-PAGE provided in Supplementary Figure 3B (BS3-crosslinked), many different species are present including monomers, dimers, tetramers, and degradation products of tetramers. In line 7 for example there are at least 5 bands.

      Tetrameric conformation of soluble proteins is evidenced by the size-exclusion chromatographs shown in Figures 3a and 6b. The BS3 crosslinked SDS-PAGE are only suggestive data, indicating that the protein is a tetramer if a band appears at ~250 kDa. However, depending on the reaction conditions, lower molecular weight bands may also be observed if crosslinking is incomplete.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Wu et al. introduce a novel approach to reactivate the Muller glia cell cycle in the mouse retina by simultaneously reducing p27Kip1 and increasing cyclin D1 using a single AAV vector. The approach effectively promotes Muller glia proliferation and reprograming without disrupting retinal structure or function. Interestingly, reactivation of the Muller glia cell cycle downregulates IFN pathway, which may contribute to the induced retinal regeneration. The results presented in this manuscript may offer a promising approach for developing Müller glia cell-mediated regenerative therapies for retinal diseases.

      Strengths:

      The data are convincing and supported by appropriate, validated methodology. These results are both technically and scientifically exciting and are likely to appeal to retinal specialists and neuroscientists in general.

      Weaknesses:

      There are some data gaps that need to be addressed.

      (1) Please label the time points of AAV injection, EdU labeling, and harvest in Figure 1B.

      We thank the reviewer for highlighting the lack of clarity in our experimental design. We have labeled all experiment timelines in the figures where appropriate in the revised version.

      (2) What fraction of Müller cells were transduced by AAV under the experimental conditions?

      We apologize for not clearly explaining the AAV transduction effeciency. AAV transduction efficiency was not uniform across the retinas. The retinal region adjacent to the optic nerve exhibits a transduction efficiency of nearly 100%. In contrast, the peripheral retina shows a lower transduction efficiency compared to the central region. The representative retinal sections with typical infection pattern are shown in Supplementary figure 4. The quantification of Edu+ MG or other markers was conducted in a 250 µm region with the highest efficiency. For scRNA-seq experiment, retinal regions with high AAV transduction efficiency were dissected with the aid of a control GFP virus.   

      (3) It seems unusually rapid for MG proliferation to begin as early as the third day after CCA injection. Can the authors provide evidence for cyclin D1 overexpression and p27 Kip1 knockdown three days after CCA injection?

      We included the data that GFP expression is evident at 3 days post AAV-GFP-GFP injection (Supplementary Fig. 1B). Additionally, we performed immunostaining and confirmed cyclin D1 overexpression at 3 days post CCA injection (Fig. 2E) as well as qPCR analysis to confirm cyclin D1 overexpression and p27kip1 knockdown at the same time point (Supplementary Fig. 5).

      (4) The authors reported that MG proliferation largely ceased two weeks after CCA treatment. While this is an interesting finding, the explanation that it might be due to the dilution of AAV episomal genome copies in the dividing cells seems far-fetched.

      We agree with the reviewer that dilution of AAV episomal genomes is unlikely to be the sole reason for the stop of MG proliferation. By staining cyclin D1 at various days post CCA injection, we found that cyclin D1 is immediately downregulated in the mitotic MG undergoing interkinetic nuclear migration to the outer nuclear layer (Fig. 2G-I). In contrast, the effect of p27<sup>kip1</sup> knockdown by CCA lasted longer (Supplementary Figure 9-10). It is possible that other anti-proliferative genes are involved in the immediate downregulation of Cyclin D1.

      Reviewer #2 (Public Review):

      This manuscript by Wu, Liao et al. reports that simultaneous knockdown of P27Kip1 with overexpression of Cyclin D can stimulate Muller glia to re-enter the cell cycle in the mouse retina. There is intense interest in reprogramming mammalian muller glia into a source for neurogenic progenitors, in the hopes that these cells could be a source for neuronal replacement in neurodegenerative diseases. Previous work in the field has shown ways in which mouse Muller glia can be neurogenically reprogrammed and these studies have shown cell cycle re-entry prior to neurogenesis. In other works, typically, the extent of glial proliferation is limited, and the authors of this study highlight the importance of stimulating large numbers of Muller glia to re-enter the cell cycle with the hopes they will differentiate into neurons. While the evidence for stimulating proliferation in this study is convincing, the evidence for neurogenesis in this study is not convincing or robust, suggesting that stimulating cell cycle-reentry may not be associated with increasing regeneration without another proneural stimulus.

      Below are concerns and suggestions.

      Intro:

      (1) The authors cite past studies showing "direct conversion" of MG into neurons. However, these studies (PMID: 34686336; 36417510) show EdU+ MG-derived neurons suggesting cell cycle re-entry does occur in these strategies of proneural TF overexpression.

      We thank the reviewer for pointing this out. We have revised the statement to "MG reprogramming".

      (2) Multiple citations are incorrectly listed, using the authors first name only (i.e. Yumi, et al; Levi, et al;). Studies are also incompletely referenced in the references.

      We apologize for the mistakes in reference. We have corrected the reference mistakes in the revised version.

      Figure 1:

      (3) When are these experiments ending? On Figure 1B it says "analysis" on the end of the paradigm without an actual day associated with this. This is the case for many later figures too. The authors should update the paradigms to accurately reflect experimental end points.

      We thank the reviewer for highlighting the lack of clarity in our experimental design. We have labeled all experiment timelines in the figures where appropriate in the revised version.

      (4) Are there better representative pictures between P27kd and CyclinD OE, the EdU+ counts say there is a 3 fold increase between Figure 1D&E, however the pictures do not reflect this. In fact, most of the Edu+ cells in Figure 1E don't seem to be Sox9+ MG but rather horizontally oriented nuclei in the OPL that are likely microglia.

      Thanks to the reviewer for pointing this out. We have replaced the image of cyclin D1 OE retina which a more representative image.

      (5) Is the infection efficacy of these viruses different between different combinations (i.e. CyclinD OE vs. P27kd vs. control vs. CCA combo)? As the counts are shown in Figure 1G only Sox9+/Edu+ cells are shown not divided by virus efficacy. If these are absolute counts blind to where the virus is and how many cells the virus hits, if the virus efficacy varies in efficiency this could drive absolute differences that aren't actually biological.

      Rule out the possibility that the differences in MG proliferation across groups are due to variations in viral efficacy, we have examined the p27<sup>kip1</sup> knockdown and cyclin D1 overexpression efficiencies for all four groups by qPCR analysis. The result showed that cyclin D1 overexpression efficiency by AAV-GFAP-Cyclin D1 virus alone or P27 knockdown efficiency by AAV-GFAP-mCherry-p27kip1 shRNA1 is comparable to, if not even higher than, those by CCA virus (Supplementary Fig 5). Therefore, the virus efficacy cannot explain the drastic increase in MG proliferation by CCA. 

      As the central retina usually had 100% infection efficacy (Supplementary Fig. 4), we quantified the Edu+Sox9+ cell number in the 250µm regions next to the optic nerve.

      (6) According to the Jax laboratories, mice aren't considered aged until they are over 18months old. While it is interesting that CCA treatment does not seem to lose efficacy over maturation I would rephrase the findings as the experiment does not test this virus in aged retinas.

      Thank you to the reviewer for bringing this to our attention. We have changed to “older adult mice” in our revised manuscript.

      (7) Supplemental Figure 2c-d. These viruses do not hit 100% of MG, however 100% of the P27Kip staining is gone in the P27sh1 treatment, even the P27+ cell in the GCL that is likely an astrocyte has no staining in the shRNA 1 picture. Why is this?

      We have replaced the images in Supplementary Fig. 2B-D.

      Figure 2

      (8) Would you expect cells to go through two rounds of cell cycle in such a short time? The treatment of giving Edu then BrdU 24 hours later would have to catch a cell going through two rounds of division in a very short amount of time. Again the end point should be added graphically to this figure.

      We thank the reviewer for the comment. We repeated the Edu/BrdU colabelling experiment with extended periods of Edu/BrdU injections. Based on the result of the MG proliferation time course study (Fig. 2A), we injected 5 times of Edu from D1 to D5 and 5 times of BrdU from D6 to D10 post-CCA injection, which covered the major phase of MG proliferation (Fig. 2B-C). Consistent with the previous findings, we did not observe any BrdU&EdU double positive MG cells.

      Additionally, we showed that cyclin D1 overexpression immediately ceased in migrating mitotic MG (Fig. 2G-I), which may explain why CCA-treated MG do not progress to the second round of cell division.

      Figure 3

      (9) I am confused by the mixing of ratios of viruses to indicate infection success. I know mixtures of viruses containing CCA or control GFP or a control LacZ was injected. Was the idea to probe for GFP or LacZ in the single cell data to see which cells were infected but not treated? This is not shown anywhere?

      The virus infection was not uniform across the entire retina (Supplementary Fig. 4). To mark the infection hotspots, we added 10% GFP virus to the mixture. Regions of the retina with low infection efficiency were removed by dissection and excluded from the scRNA-seq analysis. Therefore, we assumed that the vast majority of MG were infected by CCA. We apologize for not clearly explaining this methodological detail in the original text. We have added the experimental design to Fig. 3A and revised the result part (line 191-196) accordingly.

      (10) The majority of glia sorted from TdTomato are probably not infected with virus. Can you subset cells that were infected only for analysis? Otherwise it makes it very hard to make population judgements like Figure 3E-H if a large portion are basically WT glia.

      This question is related to the last one. Since the regions with high virus infection efficiency were selectively dissected and isolated for analysis, the CCA-infected MG should constitute the vast majority of MG in the scRNA-seq data.

      (11) Figure 3C you can see Rho is expressed everywhere which is common in studies like this because the ambient RNA is so high. This makes it very hard to talk about "Rod-like" MG as this is probably an artifact from the technique. Most all scRNA-seq studies from MG-reprogramming have shown clusters of "rods" with MG hybrid gene expression and these had in the past just been considered an artifact.

      We agree with the reviewer that the high rod gene expression in the rod-MG cluster is an artifact. We have performed multiple rounds of RNA in situ hybridization on isolated MG nuclei. The counts of Gnat1 and Rho mRNA signal are largely overlapped between the two samples with and without CCA treatment (Supplementary Fig 14). Some MG in the control retinas without CCA treatment had up to 7 or 8 dots per cell, suggesting contamination of attached rod cell debris during retina dissociation (Supplementary Fig 14). Therefore, the result did not support that rod-MG is a reprogrammed MG population with rod gene upregulation.

      (12) It is mentioned the "glial" signature is downregulated in response to CCA treatment. Where is this shown convincingly? Figure H has a feature plot of Glul, which is not clear it is changed between treatments. Otherwise MG genes are shown as a function of cluster not treatment.

      We have added box plots of several MG-specific genes to illustrate the downregulation of the glial signature in the relevant cell cluster in the revised manuscript (Supplementary Fig. 15).

      Figure 4

      (13) The authors should be commended for being very careful in their interpretations. They employ the proper controls (Er-Cre lineage tracing/EdU-pulse chasing/scRNA-seq omics) and were very careful to attempt to see MG-derived rods. This makes the conclusion from the FISH perplexing. The few puncta dots of Rho and GNAT in MG are not convincing to this reviewer, Rho and GNAT dots are dense everywhere throughout the ONL and if you drew any random circle in the ONL it would be full of dots. The rigor of these counts also comes into question because some dots are picked up in MG in the INL even in the control case. This is confusing because baseline healthy MG do not express RNA-transcripts of these Rod genes so what is this picking up? Taken together, the conclusion that there are Rod-like MG are based off scRNA-seq data (which is likely ambient contamination) and these FISH images. I don't think this data warrants the conclusion that MG upregulate Rod genes in response to CCA.

      Given the results of RNA in situ hybridization on isolated MG, we revisited the result of the RNA in situ hybridization on retinal sections as well. We performed RNA in situ in the retinal section at 1 week post CCA treatment, expecting to see lower Gnat1 and Rho signals in the ONL-localizing MG compared to 3 weeks and 4 months post CCA treatment. However, we observed similar levels across all three time points (data not shown). The lack of dynamic changes in rod gene expression levels also suggests contamination from tightly surrounding neighboring rods. Consequently, we have reinterpreted the scRNA-seq and RNA FISH data and withdrawn the conclusion that MG upregulated rod genes after CCA treatment. We thank the reviewer for pointing out this potential issue and helping us avoid an incorrect conclusion.

      Figure 5

      (14) Similar point to above but this Glul probe seems odd, why is it throughout the ONL but completely dark through the IPL, this should also be in astrocytes can you see it in the GCL? These retinas look cropped at the INL where below is completely black. The whole retinal section should be shown. Antibodies exist to GS that work in mouse along with many other MG genes, IHC or western blots could be done to better serve this point.

      We have replaced the images in Figure 4 in the revised manuscript. Additionally, we have performed the Sox9 antibody staining to demonstrate partial MG dedifferentiation following CCA treatment (Figure 5).

      Figure 6

      (15) Figure 6D is not a co-labeled OTX2+/ TdTomato+ cell, Otx2 will fill out the whole nucleus as can be seen with examples from other MG-reprogramming papers in the field (Hoang, et al. 2020; Todd, et al. 2020; Palazzo, et al. 2022). You can clearly see in the example in Figure 6D the nucleus extending way beyond Otx2 expression as it is probably overlapping in space. Other examples should be shown, however, considering less than 1% of cells were putatively Otx2+, the safer interpretation is that these cells are not differentiating into neurons. At least 99.5% are not.

      We have replaced the image of Otx2+ Tdt+ Edu+ cell, which shows the whole nucleus filled with strong Otx2 staining.  

      (16) Same as above Figure 6I is not convincingly co-labeled HuC/D is an RNA-binding protein and unfortunately is not always the clearest stain but this looks like background haze in the INL overlapping. Other amacrine markers could be tested, but again due to the very low numbers, I think no neurogenesis is occurring.

      Since we didn’t find HuC/D+Tdt+EdU+ cells at 3 weeks post CCA treatment, we believe that the weak HuC/D+ staining in the MG daughter cells at 4 months is not background, but rather reflects an incomplete neurogenic switch. This suggests that the process of neurogenesis may be ongoing but not fully realized within the observed timeframe without additional stimuli.

      (17) In the text the authors are accidently referring to Figure 6 as Figure 7.

      We thank the reviewer for pointing out the mistake. We will correct the mistake in the revised manuscript.

      Figure 7

      (18) I like this figure and the concept that you can have additional MG proliferating without destroying the retina or compromising vision. This is reminiscent of the chick MG reprogramming studies in which MG proliferate in large numbers and often do not differentiate into neurons yet still persist de-laminated for long time points.

      General:

      (19) The title should be changed, as I don't believe there is any convincing evidence of regeneration of neurons. Understanding the barriers to MG cell-cycle re-entry are important and I believe the authors did a good job in that respect, however it is an oversell to report regeneration of neurons from this data.

      We thank the reviewer for the suggestion. We have changed the title to “Simultaneous cyclin D1 overexpression and p27kip1 knockdown enable robust Müller glia cell cycle reactivation in uninjured mouse retina” in the revised manuscript.

      (20) This paper uses multiple mouse lines and it is often confusing when the text and figures switch between models. I think it would be helpful to readers if the mouse strain was added to graphical paradigms in each figure when a different mouse line is employed.

      We have labeled the mouse lines used in each experiment in the figures where appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mehmet Mahsum Kaplan et al. demonstrate that Meis2 expression in neural crest-derived mesenchymal cells is crucial for whisker follicle (WF) development, as WF fails to develop in wnt1-Cre;Meis2 cKO mice. Advanced imaging techniques effectively support the idea that Meis2 is essential for proper WF development and that nerves, while affected in Meis2 cKO, are dispensable for WF development and not the primary cause of WF developmental failure. The study also reveals that although Meis2 significantly downregulates Foxd1 in the mesenchyme, this is not the main reason for WF development failure. The paper presents valuable data on the role of mesenchymal Meis2 in WF development. However, further quantification and analysis of the WF developmental phenotype would be beneficial in strengthening the claim that Meis2 controls early WF development rather than causing a delay or arrest in development. A deeper sequencing data analysis could also help link Meis2 to its downstream targets that directly impact the epithelial compartment.

      Strengths:

      (1) The authors describe a novel molecular mechanism involving Mesenchymal Meis2 expression, which plays a crucial role in early WF development.

      (2) They employ multiple advanced imaging techniques to illustrate their findings beautifully.

      (3) The study clearly shows that nerves are not essential for WF development.

      We thank the reviewer for valuable comments that will help improve our study.

      Weaknesses:

      (1) The authors claim that Meis2 acts very early during development, as evidenced by a significant reduction in EDAR expression, one of the earliest markers of placode development. While EDAR is indeed absent from the lower panel in Figure 3C of the Meis2 cKO, multiple placodes still express EDAR in the upper two panels of the Meis2 cKO. The authors also present subsequent analysis at E13.3, showing one escaped follicle positive for SHH and Sox9 in Figures 1 and 3. Does this suggest that follicles are specified but fail to develop? Alternatively, could there be a delay in follicle formation? The increase in Foxd1 expression between E12.5 and E13.5 might also indicate delayed follicle development, or as the authors suggest, follicles that have escaped the phenotype. The paper would significantly benefit from robust quantification to accompany their visual data, specifically quantifying EDAR, Sox9, and Foxd1 at different developmental stages. Additionally, analyzing later developmental stages could help distinguish between a delay or arrest in WF development and a complete failure to specify placodes.

      The earliest DC (FOXD1) and placodal (EDAR, LEF1) markers tested in this study were observed only in the escaped WFs whereas these markers were missing in expected WF sites in mutants. This was also reflected in the loss of typical placodal morphology in the mutant’s epithelium. On the other hand, escaped WFs developed normally as shown by the analysis in Supp Fig 1A-B showing their normal size. These data suggest that development of escaped WFs is not delayed because they would appear smaller in size. To strengthen this conclusion, we assessed whisker development at E18.5 in Meis2 cKO mice by EDAR staining and results are shown in newly added Supplementary Figure 2. This experiment revealed that whisker phenotype persisted until E18.5 therefore this phenotype cannot be explained by a developmental delay.

      As far as quantification is concerned, we have already quantified the number of whiskers in controls and mutants at E12.5 and E13.5 in all whole mount experiments we did, i.e. Shh ISH and SOX9 or EDAR whole mount IFC. We pooled all these numbers together and calculated the whisker number reduction to 5.7+/-2.0% at E12.5 and 17.1+/-5.9 at E13.5. Line:132-134.

      (2) The authors show that single-cell sequencing reveals a reduction in the pre-DC population, reduced proliferation, and changes in cell adhesion and ECM. However, these changes appear to affect most mesenchymal cells, not just pre-DCs. Moreover, since E12.5 already contains WFs at different stages of development, as well as pre-DCs and DCs, it becomes challenging to connect these mesenchymal changes directly to WF development. Did the authors attempt to re-cluster only Cluster 2 to determine if a specific subpopulation is missing in Meis2 cKO? Alternatively, focusing on additional secreted molecules whose expression is disrupted across different clusters in Meis2 cKO could provide insights, especially since mesenchymal-epithelial communication is often mediated through secreted molecules. Did the authors include epithelial cells in the single-cell sequencing, can they look for changes in mesenchyme-epithelial cell interactions (Cell Chat) to indicate a possible mechanism?

      We agree with the reviewer that the effect of Meis2 on cell proliferation and expression of cell adhesion and ECM markers are more general because they take place in the whole underlying mesenchyme. Our genetic tools did not allow specific targeting of DC or pre-DCs. Nonetheless, we trust that our data show that mesenchymal Meis2 is required for the initial steps of WF development including Pc formation. As far as bioinformatics data are concerned, this data set was taken from the large dataset GSE262468 covering the whole craniofacial region which led to very limited cell numbers in the cluster 2 (DC): WT_E12_5 --> 28, WT_E13_5 --> 131, MUT_E12_5 --> 19, MUT_E13_5 --> 28. Unfortunately, such small cell numbers did not allow further sub-clustering, efficient normalization, integration and conclusions from their transcriptional profiles. Although a number of interesting differentially expressed genes were identified (see supplementary datasets), none of them convincingly pointed at reasonable secreted molecule candidate. 

      We agree with the reviewer that cellchat analysis could provide robust indication of the mesenchymal-epithelial communication, however our datasets included only mesenchymal cell population (Wnt1-Cre2progeny) and epithelial cells were excluded by FACS prior to sc RNA-seq. (Hudacova et al. https://doi.org/10.1016/j.bone.2024.117297)

      (3) The authors aim to link Meis2 expression in the mesenchyme with epithelial Wnt signaling by analyzing Lef1, bat-gal, Axin1, and Wnt10b expression. However, the changes described in the figures are unclear, and the phenotype appears highly variable, making it difficult to establish a connection between Meis2 and Wnt signaling. For instance, some follicles and pre-condensates are Lef1 positive in Meis2 cKO. Including quantification or providing a clearer explanation could help clarify the relationship between mesenchymal Meis2 and Wnt signaling in both epidermal and mesenchymal cells. Did the authors include epithelial cells in the sequencing? Could they use single-cell analysis to demonstrate changes in Wnt signaling?

      We have now analyzed changes in LEF1 staining intensity in the epithelium and in the upper dermis. According to these quantifications, we observed a considerable decline in the number of LEF1+ placodes in the epithelium which corresponds to the lower number of placodes. On the other hand, LEF1 intensity in the ‘escaped’ placodes were similar between controls and mutants. LEF1 signal in the upper dermis is very strong overall and its quantification did not reveal any changes in the DC and non-DC region of the upper dermis. These data corroborate with our conclusion that Meis2 in the mesenchyme is not crucial for the dermal WNT signaling but is required for induction of LEF1 expression in the epithelium. However, once ‘escaper’ placodes appear, they display normal wnt signaling in Pc, DC and subsequent development. These quantitative data have been added to the revised manuscript. Line247-260.

      (4) Existing literature, including studies on Neurog KO and NGF KO, as well as the references cited by the authors, suggest that nerves are unlikely to mediate WF development. While the authors conduct a thorough analysis of WF development in Neurog KO, further supporting this notion, this point may not be central to the current work. Additionally, the claim that Meis2 influences trigeminal nerve patterning requires further analysis and quantification for validation.

      We agree with the reviewer that analysis of the Neurogenin1 knockout mice should not be central to this report. Nonetheless, a thorough analysis of WF development in Neurog1 KO was needed to distinguish between two possible mechanisms: whisker phenotype in Meis2 cKO results from 1. impaired nerve branching 2. Function of Meis2 in the mesenchyme. We will modify the text accordingly to make this clearer to readers. We also agree that nerve branching was not extensively analyzed in the current study but two samples from mutant mice were provided (Fig1 and Supp Videos), reflecting the consistency of the phenotype (see also Machon et al. 2015). This section was not central to this report either but led us to focus fully on the mesenchyme. We think that Meis2 function in cranial nerve development is very interesting and deserves a separate study.

      We have edited the introduction to reflect the literature better. Line70-79.

      (5) Meis2 expression seems reduced but has not entirely disappeared from the mesenchyme. Can the authors provide quantification?

      We have attempted to quantify MEIS2 staining in the snout dermis. However, the background fluorescence made it challenging to reliable quantify. Additionally, since at the point, dermal region where MEIS2 expression is relevant to induce WF formation is not known, we were unable to determine the regions to analyze. Instead, we now added three additional images from multiple regions of the snout sections stained with MEIS2 antibody in Supplementary Figure 1C. We believe newly added images will make our conclusion that MEIS2 is efficiently deleted in the mutants more convincing.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Kaplan et al. study mesenchymal Meis2 in whisker formation and the links between whisker formation and sensory innervation. To this end, they used conditional deletion of Meis2 using the Wnt1 driver. Whisker development was arrested at the placode induction stage in Meis2 conditional knockouts leading to the absence of expression of placodal genes such as Edar, Lef1, and Shh. The authors also show that branching of trigeminal nerves innervating whisker follicles was severely affected but that whiskers did form in the complete absence of trigeminal nerves.

      Strengths:

      The analysis of Meis2 conditional knockouts convincingly shows a lack of whisker formation and all epithelial whisker/hair placode markers were analyzed. Using Neurog1 knockout mice, the authors show equally convincingly that whiskers and teeth develop in the complete absence of trigeminal nerves.

      We thank the reviewer for valuable comments that will help improve our study.

      Weaknesses:

      The manuscript does not provide much mechanistic insight as to why mesenchymal Meis2 leads to the absence of whisker placodes. Using a previously generated scRNA-seq dataset they show that two early markers of dermal condensates, Foxd1 and Sox2, are downregulated in Meis2 mutants. However, given that placodes and dermal condensates do not form in the mutants, this is not surprising and their absence in the mutants does not provide any direct link between Meis2 and Foxd1 or Sox2. (The absence of a structure evidently leads to the absence of its markers.)

      We apologize for unclear explanation of our data. We meant that Meis2 is functionally upstream of Foxd1 because Foxd1 is reduced upon Meis2 deletion. This means that during WF formation, Meis2 operates before Foxd1 induction and does not mean necessarily that Meis2 directly controls expression of Foxd1. Yes, we agree with reviewer’s note that Foxd1 and Sox2, as known DC markers, decline because the number of WF declines. We wanted to convince readers that Meis2 operates very early in the GRN hierarchy during WF development. We also admit that we provide poor mechanistic insights into Meis2 function as a transcription factor. We think that this weak point does not lower the value of the report showing indispensable role of Meis2 in WFs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The text could benefit from editing.

      We have proofread the text.

      Some information is missing from the materials and methods section - a description of sequenced cells, the ISH protocol used, etc.

      Methodological section has been updated and single-cell experiments were performed and described in detail by Hudacova et al. 2025  (https://doi.org/10.1016/j.bone.2024.117297). We have utilized these datasets for scRNA analysis which has been described sufficiently in the referred paper. Reference for standard in site protocol has been added.

      Reviewer #2 (Recommendations for the authors):

      In the Introduction of the paper, the authors raise the question on the role of innervation in whisker follicle induction "It has been speculated that early innervation plays a role in initiating WF formation (ref. 1)"...and..."this revives the previous speculations that axonal network may be involved in WF positioning". However, the authors forget to mention that Wrenn & Wessless, 1984 (reference 1 in the manuscript) made exactly the opposite conclusion and stated e.g. "Nerve trunks and branches are present in the maxillary process well before any sign of vibrissa formation. Because innervation is so widespread there appears to be no immediate temporal correlation between the outgrowth of a nerve branch to a site and the generation of a vibrissa there. Furthermore, at the time just prior to the formation of the first follicle rudiment, there is little or no nerve branching to the presumptive site of that first follicle while branches are found more dorsally where vibrissae will not form until later." Therefore, I find that referring to the paper by Wrenn & Wessells is somewhat misleading. Given that the whisker follicles develop in ex vivo cultured whisker pads further hints that innervation is unlikely to play a role in whisker follicle induction.

      The Introduction also hints at the role of innervation in tooth induction but forgets to refer to the literature that shows exactly the opposite. Based on the evidence it rather appears that the developing tooth regulates the establishment of its own nerve supply, not that the nerves would regulate induction of tooth development.

      in my opinion, the Introduction should be partially rewritten to better reflect the literature.

      The introduction has been revised to better reflect the literature on the role of innervation on WF and tooth development. Line70-87.

      The authors conclude that Meis2 is upstream of Foxd1, but the evidence is based on the lack of Foxd1 expression in Meis2 mutants. However, as whiskers do not form, evidently all markers are also absent. More direct evidence of Meis2 being upstream of Foxd1 (or Sox2) should be presented to consolidate the conclusions.

      We have already reacted to this point above in the section Weaknesses. The text is now modified so that the interpretation is correct. Line: 407-409.

      Other comments:

      Author contributions state that XX performed experiments but the author list does not include anyone with such initials.

      This error has been corrected in revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their supportive comments about our modeling approach and conclusions, and for raising several valid concerns; we address them briefly below. In addition, a detailed, point-by-point response to the reviewers’ comments are below, along with additions and edits we have made to the revised manuscript. 

      Concerns about model’s biological realism and impact on interpretations

      The goal of this paper was to use an interpretable and modular model to investigate the impact of varying sensorimotor delays. Aspects of the model (e.g. layered architecture, modularity) are inspired by biology; at the same time, necessary abstractions and simplifications (e.g. using an optimal controller) are made for interpretability and generalizability, and they reflect common approaches from past work. The hypothesized effects of certain simplifying assumptions are discussed in detail in Section 3.5. Furthermore, the modularity of our model allows us to readily incorporate additional biological realism (e.g. biomechanics, connectomics, and neural dynamics) in future work. In the revision, we have added citations and edits to the text to clarify these points.

      Concerns that the model is overly complex

      To investigate the impact of sensorimotor delays on locomotion, we built a closed-loop model that recapitulates the complex joint trajectories of fly walking. We agree that locomotion models face a tradeoff between simplicity/interpretability and realism — therefore, we developed a model that was as simple and interpretable as possible, while still reasonably recapitulating joint trajectories and generalizing to novel simulation scenarios. Along these lines, we also did not select a model that primarily recreates empirical data, as this would hinder generalizability and add unnecessary complexity to the model. We do not think these design choices are significant weaknesses of this model; in fact, few comparable models account for all joints involved in locomotion, and fewer explicitly compare model kinematics with kinematics from data. We have add citations and edits to the text to clarify these points in the revision. 

      Concerns about the validity of the Kinematic Similarity (KS) metric to evaluate walking

      We chose to incorporate only the first two PCA modes dimensions in the KS metric because the kernel density estimator performs poorly for high dimensional data. Our primary use of this metric was to indicate whether the simulated fly continues walking in the presence of perturbations. For technical reasons, it is not feasible to perform equivalent experiments on real walking flies, which is one of the reasons we explore this phenomenon with the model. We note the dramatic shift from walking to nonwalking as delay increases (Figure 5). To be thorough, in the revision, we have investigated the effect of incorporating additional PCA modes, and whether this affects the interpretation of our results. We have additionally added to the discussion and presentation of the KS metric to clarify its purpose in this study. We agree with the reviewers that the KS metric is too coarse to reflect fine details of joint kinematics; indeed, in the unperturbed case, we evaluate our model’s performance using other metrics based on comparisons with empirical data (Figures 2, 7, 8). 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors present a novel, multi-layer computational model of motor control to produce realistic walking behaviour of a Drosophila model in the presence of external perturbations and under sensory and motor delays. The novelty of their model of motor control is that it is modular, with divisions inspired by the fly nervous system, with one component based on deep learning while the rest are based on control theory. They show that their model can produce realistic walking trajectories. Given the mostly reasonable assumptions of their model, they convincingly show that the sensory and motor delays present in the fly nervous system are the maximum allowable for robustness to unexpected perturbations.

      Their fly model outputs torque at each joint in the leg, and their dynamics model translates these into movements, resulting in time-series trajectories of joint angles. Inspired by the anatomy of the fly nervous system, their fly model is a modular architecture that separates motor control at three levels of abstraction:

      (1) oscillator-based model of coupling of phase angles between legs,

      (2) generation of future joint-angle trajectories based on the current state and inputs for each leg (the trajectory generator), and

      (3) closed-loop control of the joint-angles using torques applied at every joint in the model (control and dynamics).

      These three levels of abstraction ensure coordination between the legs, future predictions of desired joint angles, and corrections to deviations from desired joint-angle trajectories. The parameters of the model are tuned in the absence of external perturbations using experimental data of joint angles of a tethered fly. A notable disconnect from reality is that the dynamics model used does not model the movement of the body and ground contacts as is the case in natural walking, nor the movement of a ball for a tethered fly, but instead something like legs moving in the air for a tethered fly.

      n order to validate the realism of the generated simulated walking trajectories, the authors compare various attributes of simulated to real tethered fly trajectories and show qualitative and quantitative similarities, including using a novel metric coined as Kinematic Similarity (KS). The KS score of a trajectory is a measure of the likelihood that the trajectory belongs to the distribution of real trajectories estimated from the experimental data. While such a metric is a useful tool to validate the quality of simulated data, there is some room for improvement in the actual computation of this score. For instance, the KS score is computed for any given time-window of walking simulation using a fraction of information from the joint-angle trajectories. It is unclear if the remaining information in joint-angle trajectories that are not used in the computation of the KS score can be ignored in the context of validating the realism of simulated walking trajectories.

      The authors validate simulated walking trajectories generated by the trained model under a range of sensorimotor delays and external perturbations. The trained model is shown to generate realistic jointangle trajectories in the presence of external perturbations as long as the sensorimotor delays are constrained within a certain range. This range of sensorimotor delays is shown to be comparable to experimental measurements of sensorimotor delays, leading to the conclusion that the fly nervous system is just fast enough to be robust to perturbations.

      Strengths:

      This work presents a novel framework to simulate Drosophila walking in the presence of external perturbations and sensorimotor delay. Although the model makes some simplifying assumptions, it has sufficient complexity to generate new, testable hypotheses regarding motor control in Drosophila. The authors provide evidence for realistic simulated walking trajectories by comparing simulated trajectories generated by their trained model with experimental data using a novel metric proposed by the authors. The model proposes a crucial role in future predictions to ensure robust walking trajectories against external perturbations and motor delay. Realistic simulations under a range of prediction intervals, perturbations, and motor delays generating realistic walking trajectories support this claim. The modular architecture of the framework provides opportunities to make testable predictions regarding motor control in Drosophila. The work can be of interest to the Drosophila community interested in digitally simulating realistic models of Drosophila locomotion behaviors, as well as to experimentalists in generating testable hypotheses for novel discoveries regarding neural control of locomotion in Drosophila. Moreover, the work can be of broad interest to neuroethologists, serving as a benchmark in modelling animal locomotion in general.

      We thank the reviewer for their positive comments.

      Weaknesses:

      As the authors acknowledge in their work, the control and dynamics model makes some simplifying assumptions about Drosophila physics/physiology in the context of walking. For instance, the model does not incorporate ground contact forces and inertial effects of the fly's body. It is not clear how these simplifying assumptions would affect some of the quantitative results derived by the authors. The range of tolerable values of sensorimotor delays that generate realistic walking trajectories is shown to be comparable with sensorimotor delays inferred from physiological measurements. It is unclear if this comparison is meaningful in the context of the model's simplifying assumptions.

      We now discuss how some of these assumptions affect the quantitative results in the section “Towards biomechanical and neural realism”. We reproduce the relevant sentences below:

      “The inclusion of explicit leg-ground contact interactions would also make it harder for the model to recover when perturbed, because perturbations during walking often occur upon contact with the ground (e.g. the ground is slippery or bumpy).”

      “We anticipate that the increased sensory resolution from more detailed proprioceptor models and the stability from mechanical compliance of limbs in a more detailed biomechanical model would make the system easier to control and increase the allowable range of delay parameters. Conversely, we expect that modeling the nonlinearity and noise inherent to biological sensors and actuators may decrease the allowable range of delay parameters.”

      The authors propose a novel metric coined as Kinematic Similarity (KS) to distinguish realistic walking trajectories from unrealistic walking trajectories. Defining such an objective metric to evaluate the model's predictions is a useful exercise, and could potentially be applied to benchmark other computational animal models that are proposed in the future. However, the KS score proposed in this work is calculated using only the first two PCA modes that cumulatively account for less than 50% of the variance in the joint angles. It is not obvious that the information in the remaining PCA modes may not change the log-likelihood that occurs in the real walking data.

      The primary reason we designed the KS metric was to determine whether the simulated fly continues walking in the presence of perturbations. We initially limited the analysis of the KS to the first 2 principal components. For completeness, we now investigate the additional principal components in Appendix 9 and the effect of evaluating KS with different numbers of components in Appendix 10. 

      Overall, the results look similar when including additional components for impulse perturbations. For stochastic perturbations, the range of similar walking decreases as we increase the number of components used to evaluate walking kinematics. Comparing this with Appendix 9, which shows that higher components represent higher frequencies of the walking cycle, we conclude that at the edge of stability for delays (where sum of sensory and actuation delays are about 40ms), flies can continue walking but with impaired higher frequencies (relative to no perturbations) during and after perturbation. 

      We added the following text in the methods:

      “We chose 2 dimensions for PCA for two key reasons. First, these 2 dimensions alone accounted for a large portion of the variance in the data (52.7% total, with 42.1% for first component and 10.6% for second component). There was a big drop in variance explained from the first to the second component, but no sudden drop in the next 10 components (see Appendix 9). Second, the KDE procedure only works effectively in low-dimensional spaces, and the minimal number of dimensions needed to obtain circular dynamics for walking is 2. We investigate the effect of varying the number of dimensions of PCA in Appendix 10.”

      (Note that we have corrected the percentage of variance accounted for by the principal components, as these numbers were from an older analysis prior to the first draft.)

      We also reference Appendix 10 in the results:

      “We observed that robust walking was not contingent on the specific values of motor and sensory delay, but rather the sum of these two values (Fig. 5E). Furthermore, as delay increases, higher frequencies of walking are impacted first before walking collapses entirely (Appendix 10).”

      Reviewer #2 (Public Review):

      Summary:

      In this study, Karashchuk et al. develop a hierarchical control system to control the legs of a dynamic model of the fly. They intend to demonstrate that temporal delays in sensorimotor processing can destabilize walking and that the fly's nervous system may be operating with as long of delays as could possibly be corrected for.

      Strengths:

      Overall, the approach the authors take is impressive. Their model is trained using a huge dataset of animal data, which is a strength. Their model was not trained to reproduce animal responses to perturbations, but it successfully rejects small perturbations and continues to operate stably. Their results are consistent with the literature, that sensorimotor delays destabilize movements.

      Weaknesses:

      The model is sophisticated and interesting, but the reviewer has great concerns regarding this manuscript's contributions, as laid out in the abstract:

      (1) Much simpler models can be used to show that delays in sensorimotor systems destabilize behavior (e.g., Bingham, Choi, and Ting 2011; Ashtiani, Sarvestani, and Badri-Sproewitz 2021), so why create this extremely complex system to test this idea? The complexity of the system obscures the results and leaves the reviewer wondering if the instability is due to the many, many moving parts within the model. The reviewer understands (and appreciates) that the authors tested the impact of the delay in a controlled way, which supports their conclusion. However, the reviewer thinks the authors did not use the most parsimonious model possible, and as such, leave many possible sources for other causes of instability.

      We thank the reviewer for this observation — we agree that we did not make the goal of the work quite clear. The goal of this paper was to build an interpretable and generalizable model of fly walking, which was then used to investigate varying sensorimotor delays in the context of locomotion. To this end, we used a modular model to recreate walking kinematics, and then investigated the effect of delays on locomotion. Locomotion in itself is a complex phenomenon — thus, we have chosen a model that is complex enough to reasonably recapitulate joint trajectories, while remaining interpretable.

      We have clarified this in the text near the end of the introduction:

      “Here, we develop a new, interpretable, and generalizable model of fly walking, which we use to investigate the impact of varying sensorimotor delays in Drosophila locomotion.”

      We also emphasize the investigation of sensorimotor delays in the context of locomotion in the beginning of the “Effect of sensory and motor delays on walking” section:

      “... we used our model to investigate how changing sensory and motor delays affects locomotor robustness.”

      We also remark that while they are very relevant papers for our work, neither of the prior papers focus on locomotion: the first involves a 2D balance model of a biped, and the second involves drop landings of quadrupeds.

      Lastly, we note that the investigation of delay is not the only use for this model —  in the future, this model can also be used to study other aspects of locomotion such as the role of proprioceptive feedback (see “Role of proprioceptive feedback in fly walking” section). The layered framework of the model can also be extended to other animals and locomotor strategies (see “Layered model produces robust walking and facilitates local control” section”).

      (2) In a related way, the reviewer is not sure that the elements the authors introduced reflect the structure or function of the fly's nervous system. For example, optimal control is an active field of research and is behind the success of many-legged robots, but the reviewer is not sure what evidence exists that suggests the fly ventral nerve cord functions as an optimal controller. If this were bolstered with additional references, the reviewer would be less concerned.

      We thank the reviewer for the comment — we have now further clarified how our model elements reflect the fly’s nervous system. The elements we introduce are plausible but only loosely analogous to the fly’s nervous system. While we draw parallels from these elements to anatomy (e.g. in Fig 1A-B, and in the first paragraph of the Results section), we do not mean to suggest that these functional elements directly correspond to specific structures in the fly’s nervous system. A substantial portion of the suggested future work (see “Towards biomechanical and neural realism”) aims to bridge the gap between these functional elements and fly physiology, which is beyond the scope of this work. 

      We have added clarifying text to the Results section:

      “While the model is inspired by neuroanatomy, its components do not strictly correspond to components of the nervous system --- the construction of a neuroanatomically accurate model is deferred to future work (see Discussion).”

      In the specific case of optimal control — optimal control is a theoretical model that predicts various aspects of motor control in humans, there is evidence that optimal control is implemented by the human nervous system (Todorov and Jordan, 2002; Scott, 2004; Berret et al., 2011). Based on this, we make the assumption that optimal control is a reasonable model for motor control in flies implemented by the fly nervous system as well. Fly movement makes use of proprioceptive feedback signals (Mendes et al., 2013; Pratt et al., 2024; Berendes et al., 2016), and optimal control is a plausible mechanism that incorporates feedback signals into movement.

      We have added the following clarifying text in the Results section: 

      “The optimal controller layer maintains walking kinematics in the presence of sensori motor delays and helps compensate for external perturbations. This design was inspired by optimal control-based models of movements in humans (Todorov and Jordan, 2002; Scott, 2004; Berret et al., 2011)”

      (3) "The model generates realistic simulated walking that matches real fly walking kinematics...". The reviewer appreciates the difficulty in conducting this type of work, but the reviewer cannot conclude that the kinematics "match real fly walking kinematics". The range of motion of several joints is 30% too small compared to the animal (Figure 2B) and the reviewer finds the video comparisons unpersuasive. The reviewer would understand if there were additional constraints, e.g., the authors had designed a robot that physically could not complete the prescribed motions. However the reviewer cannot think of a reason why this simulation could not replicate the animal kinematics with arbitrary precision, if that is the goal.

      We agree with the reviewer that the model-generated kinematics are not perfectly indistinguishable from real walking kinematics, and now clarify this in the text. We also agree with the reviewer that one could build a model that precisely replicates real kinematics, but as they intuit, that was not our goal. Our goal was to build a model that both replicates animal kinematics, and is interpretable and generalizable (which allows us to investigate what happens when perturbations and varying sensorimotor delays are introduced). There is a trade-off between realism and generalizability — a simulation that fully recreates empirical data would require a model that is completely fit to data, which is likely to be more complex (in terms of parameters required) and less generalizable to novel scenarios. We have made design choices that result in a model that balances these trade-offs. We do not consider this to be a weakness of the model; in fact, few comparable models account for all joints involved in locomotion, and fewer explicitly compare model kinematics with kinematics from data.

      We have tempered the language in the abstract:

      “The model generates realistic simulated walking that resembles real fly walking kinematics”

      The tempered statement, we believe, is a fair characterization of the walking — it resembles but does not perfectly match real kinematics.

      We have also introduced clarifying text in the introduction:

      “Overall, existing walking models focus on either kinematic or physiological accuracy, but few achieve both, and none consider the effect of varying sensorimotor delays. Here, we develop a new, interpretable, and generalizable model of fly walking, which we use to investigate the impact of varying sensorimotor delays in Drosophila locomotion.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Potential typo on page 5:

      2.1.2 Joint kinematics trajectory generator

      Paragraph 4, last line: Original text - ".....it also estimates the current phase". Suggested correction - "...it also estimates the current phase velocity"

      Done

      Potential typo on page 8:

      2.3 Model maintains walking under unpredictable external perturbations.

      Paragraph 3, line 2: Original text - "...brief, unexpected force (e.g. legs slipping on an unstable surface)".

      Consider replacing force with motion, or providing an example of a force as opposed to displacement (slipping).

      Done

      Potential typo on page 8:

      2.3 Model maintains walking under unpredictable external perturbations.

      Paragraph 3, line 4: Original text - "The magnitude of this velocity is drawn from a normal distribution...".

      Is this really magnitude? If so, please discuss how the sign (+/-) is assigned to velocity, and how the normal distribution is centred so as to sample only positive values representing magnitude.

      Indeed the magnitude of the velocity is drawn from a normal distribution. A positive or negative sign is then assigned with equal odds. We have added text to clarify this:

      “The sign of the velocity was drawn separately so that there is equal likelihood for negative or positive perturbation velocities.”

      Page 8:

      2.3 Model maintains walking under unpredictable external perturbations.

      In Paragraph 5: Why is the data reduced to only 2 dimensions? Could higher order PCA modes (cumulatively accounting for more than 50% variance in the data) not have distinguishing information between realistic and unrealistic walking trajectories?

      We provide a longer response for this in the public review above.

      Page 11:

      Why wouldn't a system trained in the presence of external perturbations perform better? What is the motivation to remove external perturbations during training?

      We agree that a system trained in the presence of external perturbations would probably perform better — however, we do not have data that contains walking with external perturbations. Nothing was removed — all the data used in this study involve a fly walking without perturbations.

      We have added a clarification:

      “our model maintains realistic walking in the presence of external dynamic perturbations, despite being trained only on data of walking without perturbations (no perturbation data was available).”

      Page 16:

      4.1 Tracking joint angles of D. melanogaster walking in 3D.

      Paragraph 1: Readers who wish to collect similar data might benefit from specifying the exposure time, animal size in pixels (or camera sensor format and field of view), in addition to the frame rate. Alternatively, consider mentioning the camera and lens part numbers provided by the manufacturer.

      This is a good point. We have updated the text to include these specifications:

      “We obtained fruit fly D. melanogaster walking kinematics data following the procedure previously described in (Karashchuk et al, 2021). Briefly, a fly was tethered to a tungsten wire and positioned on a frictionless spherical treadmill ball suspended on compressed air. Six cameras (Basler acA800-510um with Computar zoom lens MLM3X-MP) captured the movement of all of the fly's legs at 300 Hz. The fly size in pixels ranges from about 300x300 up to 700x500 pixels across the 6 cameras. Using Anipose, we tracked 30 keypoints on the fly, which are the following 5 points on each of the 6 legs: body-coxa, coxa-femur, femur-tibia, and tibia-tarsus joints, as well as the tip of the tarsus.”

      Potential typos on page 18:

      4.3.3 Training procedure

      Paragraph 2, line 1: Original text - "..(, p)"

      Do the authors mean "...(, )"

      Paragraph 2, line 2: Original text - "... (,, v, p)" Do the authors mean "... (,, v, )"?

      Paragraph 3, line 3: Original text - "... (,, v, p)" Do the authors mean "... (,, v, )"?

      Thank you for pointing out this issue. We have now fixed the phase p to be \phi to be consistent with the rest of the text.

      Paragraph 3, line 3: Original text - "...()"

      Do the authors mean "(d)"? If not, please discuss the difference between and d.

      Thank you for pointing this out. \hat \theta and \theta_d were used interchangeably which is confusing. We have standardized our reference to the desired trajectory as \theta_d throughout the text.

      Page 19:

      Typo after eqn. (6):

      Original text: "where x := q - q, ... A and B are Jacobians with respect to...."

      Correction: "where x := q - q, ... Ac and Bc are Jacobians with respect to...."

      Similar corrections in eqn. 7 and eqn. 8: A and B should be replaced with Ac and Bc. Done

      Page 19, eqn. (10b):

      Should the last term be qd(t+T) as opposed to qd(t+1)?

      No: in fact (10a) contains the typo: it should be y(t+1) as opposed to y(t+T). This has been fixed.

      Page 19

      The authors' detailed description of the initial steps leading up to the dynamics model, involving the construction of the ODE, linearizing the system about the fixed point makes the text broadly accessible to the general reader. Similarly, adding some more description of the predictive model (eqn. 11 - 15) could improve the text's accessibility and the reader's appreciation for the model. This is especially relevant since the effects of sensorimotor delay and external perturbations, which are incorporated in the control and dynamics model, form a major contribution to this work. What do the matrices F, G, L, H, and K look like for the Drosophila model? Are there any differences between the model in Stenberg et al. (referenced in the paper) and the authors' model for predictive control? Are there any differences in the assumptions made in Stenberg et al. compared to the model presented in this work? The readers would likely also benefit from a figure showing the information flow in the model, and describing all the variables used in the predictive control model in eqn. 11 through eqn. 15 (analogous to Figure 1 in Stenberg et al. (2022)). Such a detailed description of the control and dynamics model would help the reader easily appreciate the assumptions made in modelling the effects of sensorimotor delay and external perturbations.

      Done

      Page 20:

      Eqn. 12: Should z(t+1) be z(t+T) instead?

      Similar comment for eqn. 14

      No: we made a mistake in (10a); there should be no (t+T) terms; all terms should be (t+1) terms to reflect a standard discrete-time difference equation.

      Eqn. 13: r(t) can be defined explicitly

      Done

      4.5 Generate joint trajectories of the complete model with perturbations Paragraph 2, line 2: Please read the previous comment

      \hat \theta and \theta_d were previously used interchangeably which is confusing. We have standardized our reference to the desired trajectory as \theta_d throughout the text.

      Original text - "Every 8 timesteps, we set :=...."

      Does this mean dis set to? If so, the motivation for this is not clear.

      We mean that \theta_d is set to be equal to \theta. We have replaced “:=” with “=” for clarity.

      General comments for the authors:

      Could the authors discuss the assumptions regarding Drosophila physiology implied in the control model?

      The control model is primarily included as a plausible functional element of the fly’s nervous system, and as such implies minimal assumptions on physiology itself. The main assumption, which is evident from the description of the model components, is that the fly uses proprioceptive feedback information to inform future movements.

      We have added clarifying text to the Results section:

      “While the model is inspired by neuroanatomy, its components do not strictly correspond to components of the nervous system --- the construction of a neuroanatomically accurate model is deferred to future work (see Discussion).”

      The authors acknowledge the absence of ground contact forces in the model. It is probably worth discussing how this simplification may affect inferences regarding the acceptable range of sensorimotor delay in generating realistic walking trajectories.

      We agree, and discuss how some of these assumptions affect the quantitative results in the section “Towards biomechanical and neural realism”. We replicate the relevant sentences below:

      “The inclusion of explicit leg-ground contact interactions would also make it harder for the model to recover when perturbed, because perturbations during walking often occur upon contact with the ground (e.g. the ground is slippery or bumpy).”

      The effects of other simplifications are also mentioned in the same section.

      Can the authors provide an insight into why the use of a second derivative of joint angles as the output of the trajectory generator () leads to more realistic trajectories (4.3.1 Model formulation, paragraph 1)?

      Does the use of a second-order derivative of joint angles lead to drift error because of integration?

      Could the distribution of θd produced be out of the domain due to drift errors? Could this affect the performance of the neural network model approximating the trajectory generator?

      We are not sure why the second derivative works better than the first derivative. It is possible that modeling the system as a second order differential equation gives the network more ability to produce complex dynamics. 

      As can be seen in the example time series in Figures 2 and 3 and supplemental videos, there is no drift error from integration, so it is unlikely to affect the performance of the neural network.

      What does the model's failure (quantified by a low KS score) look like in the context of fly dynamics? What do the joint angles look like for low values of KS score? Does the fly fall down, for example?

      Since the model primarily considers kinematics, a low KS score means that kinematics are unrealistic, e.g. the legs attain unnatural angles or configurations. Examples of this can be seen in videos 4-7 (linked from Appendix 1 of the paper), as well as in the bottom row of Fig. 5, panel A. Here, at 40ms of motor delay, L2 femur rotation is seen to attain values that far exceed the normal ranges. 

      We have added a small clarification in the caption of Fig.5 panel A:

      “low KS indicates that the perturbed walking deviates from data and results in unnatural angles

      (as seen at 40ms motor delay)” 

      We remark that since our simulations do not incorporate contact forces (as the reviewer remarks above, we simulate something like legs moving in the air for a tethered fly), the fly cannot “fall down” per se. However, if forces were incorporated then yes, these unrealistic kinematics would correspond to a fly that falls down or is no longer walking.

      Reviewer #2 (Recommendations For The Authors):

      L49: "Computational models of locomotion do not typically include delay as a tunable parameter, and most existing models of walking cannot sustain locomotion in the presence of delays and external perturbations". This remark confuses the reviewer.

      (1) If models do not "typically" include delay as a tunable parameter, this suggests that atypical models do. Which models do? Please provide references.

      Our initial phrasing was confusing. We meant to say that most models do not include delay, and some models do include delay as a fixed value (rather than a tunable value). We clarify in the updated text, which is replicated below:

      “Computational models of locomotion typically have not included delays as a tunable parameter, although some models have included them as fixed values (Geyer and Herr, 2010; Geijtenbeek et al., 2013).”

      (2) Has the statement that most existing models cannot sustain locomotion with delays been tested? If so, provide references. If not, please remove this statement or temper the language.

      Since most models don’t include delays, they cannot be run in scenarios with delays. We clarify in the updated text, which is replicated below:

      “Computational models of locomotion have not typically included delays. Some have included delay as a fixed value rather than a tunable parameter (Geyer and Herr, 2010; Geijtenbeek et al., 2013). However, in general, the impact of sensorimotor delays on locomotor control and robustness remains an underexplored topic in computational neuroscience.”

      L57: "two of six legs lift off the ground at a time" - Two legs are off the ground at any time, but they do not "lift off" simultaneously in the fruit fly. To lift off simultaneously, contralateral leg pairs would need to be 33% out of phase with one another, but they are almost always 50% out of phase.

      Thank you for pointing out this oversight. We have updated the text accordingly:

      “Flies walk rhythmically with a continuum of stepping patterns that range from tetrapod (where two of six legs are off the ground at a time) to tripod (where three of six legs are off the ground at a time)"

      L88: "a new model of fly walking" - The intention of the authors is to produce a model from which to learn about walking in the fly, is that correct? The reviewer has read the paper several times now and wants to be sure that this is the authors' goal, not to engineer a control system for an animation or a robot.

      Indeed, this is our goal. We were previously unclear about this, and have made text edits to clarify this — we provide a longer response for this in the public review above (see (1)).

      L126: "These desired phases are synchronized across pairs of legs to maintain a tripod coordination pattern, even when subject to unpredictable perturbations." - Does the animal maintain tripod coordination even when perturbed? In the reviewer's experience, flies vary their interleg coordination all the time. The reviewer would also expect that if perturbed strongly (as the supplemental videos show), the animal would adapt its interleg coordination in response. The author finds this assumption to be a weak point in the paper for the use of this disturbance exploring animal locomotion.

      We do not know exactly how flies may react to our mechanical perturbations. However, we may hypothesize based on past papers. 

      Couzin-Fuchs et al (2015) apply a mechanical perturbation to walking cockroaches. They find that that tripod is temporarily broken immediately after the perturbation but the cockroach recovers to a full tripod within one step cycle. 

      DeAngelis et al (2019) apply optogenetic perturbations to fly moonwalker neurons that drive backward walking. Flies slow down following perturbation, but then recover after 200ms (about 2-3 steps) to their original speed (on average). 

      Thus, we think it is reasonable to model a fly’s internal phase coupling to maintain tripod and for its intended speed to remain the same even after a perturbation. 

      We do agree with the reviewer that it is plausible a fly might also slow down or even stop after a perturbation and we do not model such cases. We have added some text to the discussion on future work:

      “Future work may also model how higher-level planning of fly behavior interacts with the lowerlevel coordination of joint angles and legs. Walking flies continuously change their direction and speed as they navigate the environment (Katsov et al, 2017; Iwasaki et al 2024). Past work shows that flies tend to recover and walk at similar speeds following perturbations (DeAngelis et al, 2019), but individual flies might still change walking speed, phase coupling, or even transition to other behaviors, such as grooming. Modeling these higher-level changes in behavior would involve combining our sensorimotor model with models for navigation (Fisher 2022) or behavioral transitions (Berman et al, 2016).”

      L136: "...to output joint torques to the physical model of each leg" - Is this the ultimate output of the nervous system? Muscles are certainly not idealized torque generators. There are dynamics related to activation and mechanics. The reviewer is skeptical that this is a model of neural control in the animal, because the computation of the nervous system would be tuned to account for all these additional dynamics.

      We agree with the reviewer that joint torques are not the ultimate output of the nervous system. We use a torque controller because it is parsimonious, and serves our purpose of creating an interpretable and modular locomotion model.

      We also agree that muscles are an important consideration — we make mention of them later on in the paper under the section “Toward biomechanical and neural realism”, where we state “Another step toward biological realism is the incorporation of explicit dynamical models of proprioceptors, muscles, tendons, and other biomechanical aspects of the exoskeleton.”

      Our goal is not to directly model neural control of the animal. We have introduced text clarifications to emphasize this — we provide a longer response for this in the public review above (see (2)).

      L143: "To train the network from data, we used joint kinematics of flies walking on a spherical treadmill..." This is an impressive approach, but then the reviewer is confused about why the kinematics of the model are so different from those of the animal. The animal takes longer strides at a lower frequency than the model. If the model were trained with data, why aren't they identical? This kind of mismatch makes the reviewer think the approach in this paper is too complicated to address the main problem.

      The design of our trajectory generator model is one of the simplest for reproducing the output of a dynamical system. It consists of a multilayer perceptron model that models the phase velocity and joint angle accelerations at each timestep. All of its inputs are observable and interpretable: the current joint angles, joint angle derivatives, desired walking speed, and phase angle. 

      We chose this model for ease of interpretability, integration with the optimal controller, and to allow for generalization across perturbations. Given all of these constraints, this is the best model of desired kinematics we could obtain. We note that the simulated kinematics do match real fly kinematics qualitatively (Figure 2A and supplemental videos) and are close quantitatively (Figure 2B and C). We speculate that matching the animals’ strides at all walking frequencies may require explicitly modeling differences across individual flies. We leave the design and training of more accurate (but more complex) walking models for future work.

      We add some further discussion about fitting kinematics in the discussion:

      “Although we believe our model matches the fly walking sufficiently for this investigation, we do note that our model still underfits the joint angle oscillations in the walking cycle of the fly (see Figure 2 and Appendix 3). More precise fitting of the joint angle kinematics may come from increasing the complexity of the neural network architecture, improving the training procedure based on advances in imitation learning (Hussein et al., 2018), or explicitly accounting for individual differences in kinematics across flies (Deangelis et al., 2019; Pratt et al., 2024).”

      Figure 2: The reviewer thinks the violin plots in Figure 2C are misleading. Joint angles could be greater or less than 0, correct? If so, why not keep the sign (pos/neg) in the data? Taking the absolute value of the errors and "folding over" the distribution results in some strange statistics. Furthermore, the absolute value would shroud any systematic bias in the model, e.g., joint angles are always too small. The reviewer suggests the authors plot the un-rectified data and simply include 2 dashed lines, one at 5.56 degrees and one at -5.56 degrees.

      These violin plots are averages of errors over all phases within each speed. We chose to do this to summarize the errors across all phase angle plots, which are shown in detail in Appendix 3 and 4.

      For the reviewer, we have added a plot of the raw errors across all phase angle plots in Appendix 5, E.

      L156: Should "\phi\dot" be "\phi"?

      We originally had a typo: we said “phase” when we meant “phase velocity”. This has been fixed. \phi\dot is correct.

      L160: "This control is possible because the controller operates at a higher temporal frequency than the trajectory generator...". This statement concerns the reviewer. To the reviewer, this sounds like the higher-level control system communicates with the "muscles" at a higher frequency than the low-level control system, which conflicts with the hierarchical timescales at which the nervous system operates. Or do the authors mean that the optimal controller can perform many iterations in between updates from the trajectory generator level? If so, please clarify.

      We mean that the optimal controller can perform many iterations in between updates from the trajectory generator level. The text has been clarified:

      “This control is possible because the controller operates at a higher temporal frequency than the trajectory generator in the model. The controller can perform many iterations (and reject disturbances) in between updates to and from the trajectory generator.”

      L225: "We considered two types of perturbations: impulse and persistent stochastic". Are these realistic perturbations? Realistic perturbations such as a single leg slipping, or the body movement being altered would produce highly correlated joint velocities.

      These perturbations are not quite realistic — nonetheless, we illustrate their analogousness to real perturbations in the subsequent text in the paper, and restrict our simulations to ranges that would be biologically plausible (see Appendix 7). We agree that realistic perturbations would produce highly correlated joint accelerations and velocities, whereas our perturbations produce random joint accelerations. 

      L265: "...but they are difficult to manipulate experimentally..." This is true, but it can and has been done. The authors should cite:

      Bässler, U. (1993). The femur-tibia control system of stick insects-A model system for the study of the neural basis of joint control. Brain Research Reviews, 18(2), 207-226. 

      Thank you for the suggestion, we have incorporated it into the text at the end of the referenced sentence.

      L274: "...since the controller can effectively compensate for large delays by using predictions of joint angles in the future". But can the nervous system do this? Or, is there a reason to think that the nervous system can? The reviewer thinks the authors need stronger justification from the literature for their optimal control layer.

      To clarify, this sentence describes a feature of the model’s behavior when no external perturbations are present. This is not directly relevant to the nervous system, since organisms do not typically exist in an environment free of perturbations — we are not suggesting that the nervous system does this.

      In response to the question of whether the nervous system can compensate for delays using predictions: we know that delays are present in the nervous system, perturbations exist in the environment, and that flies manage to walk in spite of them. Thus, some type of compensation must exist to offset the effects of delays (the reviewer themself has provided some excellent citations that study the effects of delays). In our model, we use prediction as the compensation mechanism — this is one of our central hypotheses. We further discuss this in the section “Predictive control is critical for responding to perturbations due to motor delay”.

      L319: "The formulation of a modular, multi-layered model for locomotor control makes new experimentally-testable hypotheses about fly motor control...". What testable hypotheses are these? The authors should explicitly state them. They are not clear to the reviewer, especially given the nonphysiological nature of the control system and the mechanics.

      A number of testable hypotheses are mentioned throughout the Discussion section:

      “Our model predicts that at the same perturbation magnitude, walking robustness decreases as delays increase. This could be experimentally tested by altering conduction velocities in the fly, for example by increasing or decreasing the ambient temperature (Banerjee et al, 2021).  If a warmer ambient temperature decreases delays in the fly, but fly walking robustness remains the same in response to a fixed perturbation, this would indicate a stronger role for central control in walking than our modeling results suggest.”

      “In our model, robust locomotion was constrained by the cumulative sensorimotor delay. This result could be experimentally validated by comparing how animals with different ratios of sensory to motor delays respond to perturbations. Alternatively, it may be possible to manipulate sensory vs. motor delays in a single animal, perhaps by altering the development of specific neurons or ensheathing glia (Kottmeier et al., 2020). If sensory and motor delays have significantly different effects on walking quality, then additional compensatory mechanisms for delays could play a larger role than we expect, such as prediction through sensory integration, mechanical feedback, or compensation through central control.”

      “we hypothesize that removing proprioceptive feedback would impair an insect's ability to sustain locomotion following external perturbations.”

      “We propose that fly motor circuits may encode predictions of future joint positions, so the fly may generate motor commands that account for motor neuron and muscle delays.”

      L323: "...and biomechanical interactions between the limb and the environment". In the reviewer's experience, the primary determinant of delay tolerance is the mechanical parameters of the limb: inertia, damping, and parallel elasticity. For example, in Ashtiani et al. 2021, equation 5 shows exactly how this comes about: the delay changes the roots and poles of the control system. This is why the reviewer is confused by the complexity of the model in this submission; a simpler model would explain why delays cannot be tolerated in certain circumstances.

      We were previously unclear about the goal of the model, and have made text edits to clarify this — we provide a longer response for this in the public review above (see (1)).

      L362: Another highly relevant reference here would be Sutton et al. 2023.

      Done

      L366: Szczecinski et al. 2018 is hardly a "model"; it is mostly a description of experimental data. How about Goldsmith, Szczecinski, and Quinn 2020 in B&B? Their model of fly walking has patterngenerating elements that are coordinated through sensory feedback. In their model, motor activation is also altered by sensory feedback. The reviewer thinks the statement "Models of fly walking have ignored the role of feedback" is inaccurate and their description of these references should be refined.

      Thank you for the suggestion; we have tempered the language and revised this section to include more references, including the suggested one — text is replicated below. 

      “Many models of fly walking ignore the role of feedback, relying instead on central pattern generators (Lobato-Rios et al., 2022; Szczecinski et al., 2018; Aminzare et al., 2018) or metachondral waves (Deangelis et al., 2019) to model kinematics. Some models incorporate proprioceptive feedback, primarily as a mechanism that alters timing of movements in inter-leg coordination (Goldsmith et al., 2020; Wang-Chen et al., 2023).”

      We remark that Szczecinski et al does include a model that replicates data without using sensory feedback, so we think it is fair to include.  

      L371: "...highly dependent on proprioceptive feedback for leg coordination during walking." What about Berendes et al. 2016, which showed that eliminating CS feedback from one leg greatly diminished its ability to coordinate with the other legs? This suggests that even flies depend on sensory feedback for proper coordination, at least in some sense.

      Interesting suggestion – we have integrated it into the text a little further down, where it better fits:

      “Silencing mechanosensory chordotonal neurons alters step kinematics in walking Drosophila (Mendes et al., 2013; Pratt et al., 2024). Additionally, removing proprioceptive signals via amputation interferes with inter-leg coordination in flies at low walking speeds (Berendes et al., 2016)”

      L426: "The layered model approach also has potential applications for bio-mimetic robotic locomotion.". How fast can this model be computed? Can it run faster than real-time? This would be an important prerequisite for use as a robot control system.

      The model should be able to be run quite fast, as it involves only

      (1) Addition, subtraction, matrix multiplication, and sinusoidal computation on scalars (for the phase coordinator and optimal controller)

      (2) Neural network inference with a relatively small network (for the trajectory generator) Whether this can run in real-time depends on the hardware capabilities of the specific robot and the frequency requirements — it is possible to run this on a desktop or smaller embedded device.

      We do note that the model needs to first be set up and trained before it can be run, which takes some time (see panel D of Figure 1).

      L432: "...which is a popular technique in robotics.". Please cite references supporting this statement.

      We have added citations: the text and relevant citations are reproduced below:

      “... which is a popular technique in robotics (Hua et al., 2021; Johns, 2021)

      Hua J, Zeng L, Li G, Ju Z. Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning. Sensors. 2021; 21(4):1278

      Johns E. Coarse-to-fine imitation learning: Robot manipulation from a single demonstration. In:

      2021 IEEE international conference on robotics and automation (ICRA) IEEE; 2021. p. 4613–4619

      L509: "We find that the phase offset across legs is not modulated across walking speeds in our dataset". This is a surprising result to the reviewer. Looking at Figure 6C, the reviewer understands that there are no drastic changes in coordinate with speed, but there are certainly some changes, e.g., L1-R3, L3-R1. In the reviewer's experience, even very small changes in interleg phasing can change the visual classification of walking from "tripod" to "tetrapod" or "metachronal". Furthermore, several leg pairs do not reside exactly at 0 or \pi radians apart, e.g., L1-L3, L2-L3, R1-R3, R2-R3. In conclusion, the reviewer thinks that setting the interleg coordination to tripod in all cases is a large assumption that requires stronger justification (or, should be eliminated altogether).

      We made a simplifying assumption of a tripod coordination across all speeds. The change in relative phase coordination across speeds is indeed relatively small and additionally we see little change in our results across forward speeds (see Figures 4B, 5C and 5D). 

      We have added text to clarify this assumption and what could be changed for future studies in the methods:

      “We estimate $\bar \phi_{ij}$ from the walking data by taking the circular mean over phase differences of pairs the legs during walking bouts. We find that the phase offset across legs is not strongly modulated across walking speeds in our dataset (see Appendix 2) so we model $\bar \phi_{ij}$ as a single constant independent of speed. In future studies, this could be a function of forward and rotation speeds to account for fine phase modulation differences.”

      L581: "of dimension...". Should the asterisk be replaced by \times? The asterisk makes the reviewer think of convolution. This change should be made throughout this paragraph.

      Good point, done.

      Figure 6: Rotational velocities in all 3 sections are reported in mm/s, but these units do not make sense. Rotational velocities must be reported in rad/s or deg/s.

      The rotation velocity of mm/s corresponded to the tangential velocity of the ball the fly walked on. We agree that this does not easily generalize across setups, so we have updated the figure rotation velocities in rad/s. 

      L619: The reviewer is unconvinced by using only 2 principal components of the data to compare the model and animal kinematics. The authors state on line 626 that the 2 principal components do not capture 56.9% of the variation in the data, which seems like a lot to the reviewer. This is even more extreme considering that the model has 20 joints, and the authors are reducing this to 2 variables; the reviewer can't see how any of the original waveforms, aside from the most fundamental frequencies, could possibly be represented in the PCA dataset. If the walking fly models looked similar to each other, the reviewer could accept that this method works. But the fact that this method says the kinematics are similar, but the motion is clearly different, leads the reviewer to suspect this method was used so the authors could state that the data was a good match.

      Our primary use of the KS metric was to indicate whether the simulated fly continues walking in the presence of perturbations, hence we limited the analysis of the KS to the first 2 principal components. 

      For completeness, we investigate the principal components in Appendix 9 and the effect of evaluating KS with different numbers of components in Appendix 10. 

      The results look similar across components for impulse perturbations. For stochastic perturbations, the range of similar walking decreases as we increase the number of components used to evaluate walking kinematics. Comparing this with Appendix 9 showing that higher components represent higher frequencies of the walking cycle, we conclude that at the edge of stability for delays (where sum of sensory and actuation delays are about 40ms), flies can continue walking but with impaired higher frequencies (relative to no perturbations) during and after perturbation. 

      We add text in the methods:

      “We chose 2 dimensions for PCA for two key reasons. First, these 2 dimensions alone accounted for a large portion of the variance in the data (52.7% total, with 42.1% for first component and 10.6% for second component)). There was a big drop in variance explained from the first to the second component, but no sudden drop in the next 10 components (see Appendix 9). Second, the KDE procedure only works effectively in low-dimensional spaces, and the minimal number of dimensions needed to obtain circular dynamics for walking is 2. We investigate the effect of varying the number of dimensions of PCA in Appendix 10.”

      (Note that we have corrected the percentage of variance accounted for by the principal components, as these numbers were from an older analysis prior to the first draft.)

      We also reference Appendix 10 in the results:

      “We observed that robust walking was not contingent on the specific values of motor and sensory delay, but rather the sum of these two values (Fig. 5E). Furthermore, as delay increases, higher frequencies of walking are impacted first before walking collapses entirely (Appendix 10).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors introduce DIPx, a deep learning framework for predicting synergistic drug combinations for cancer treatment using the AstraZeneca-Sanger (AZS) DREAM Challenge dataset. While the approach is innovative, I have the following concerns and comments which hopefully will improve the study's rigor and applicability, making it a more powerful tool in the real clinical world.

      We thank to the reviewer for recognizing the innovative aspects of DIPx and for sharing their valuable comments to further refine and strengthen our study. Those comments are carefully addressed in the following point-by-point response.

      (1) Test Set 1 comprises combinations already present in the training set, likely leading overfitting issue. The model might show inflated performance metrics on this test set due to prior exposure to these combinations, not accurately reflecting its true predictive power on unknown data, which is crucial for discovering new drug synergies. The testing approach reduces the generalizability of the model's findings to new, untested scenarios.

      From a clinical perspective, it is useful to test whether a known (previously tested) combination can work for a new patient, which is the purpose of Test Set 1. There is no danger overfitting here, because the test set is completely independent of the discovery set, so had we only discovered a false positive the test set would not have more than power than expected under the null. Predicting the effectiveness of unknown drug combinations (Test Set 2) is indeed an important and more challenging goal of synergy prediction, but it is statistically a distinct problem. The two test sets were previously designed by the AZS DREAM Challenge [PMID: 31209238].

      We have performed cross-validation on the dataset and demonstrated that the result of DIPx for Test Set 1 is not overfitting. Indeed, Figure 2—figure supplement 1 shows the 10-fold cross validation results for the training set. The median Spearman correlation between the predicted and observed Loewe scores across the 10 folds of cross-validation is 0.48, which is close to the correlation of 0.50 in Test Set 1 (red star).  We have added the cross-validation results to the “Validation and Comparisons in the AZS Dataset” section (page 4). 

      (2) The model struggles with predicting synergies for drug combinations not included in its training data (showing only a Spearman correlation of 0.26 in Test Set 2). This limits its potential for discovering new therapeutic strategies. Utilizing techniques such as transfer learning or expanding the training dataset to encompass a wider range of drug pairs could help to address this issue.

      We agree that this is an important limitation for the discovery of new therapeutic strategies. While transfer learning or expanding the training dataset could indeed help address this issue, implementing these approaches would require access to more comprehensive data, which is currently limited due to the scarcity of drug combination datasets. As more drug combination data become available in future, we plan to expand the training set to better cover a wider range of drug combinations and apply the transfer learning method to improve prediction accuracy. We have added a discussion on this in the Discussion Section.

      (3) The use of pan-cancer datasets, while offering broad applicability, may not be optimal for specific cancer subtypes with distinct biological mechanisms. Developing subtype-specific models or adjusting the current model to account for these differences could improve prediction accuracy for individual cancer types.

      We agree with the reviewer that the current settings of DIPx might not be optimal for specific cancers due to the cancer heterogeneity. However, building subtype-specific models is currently constrained by limitation of data availability, which in turn restricts their predictive power. In the Discussion section, we mention this as one of DIPx's limitations and suggest future improvements in cancer-specific models.

      (4) Line 127, "Since DIPx uses only molecular data, to make a fair comparison, we trained TAJI using only molecular features and referred to it as TAJI-M.". TAJI was designed to use both monotherapy drug-response and molecular data, and likely won't be able to reach maximum potential if removing monotherapy drug-response from the training model. It would be critical to use the same training datasets and then compare the performances. From Figure 6 of TAJI's paper (Li et al., 2018, PMID: 30054332) , i.e., the mean Pearson correlation for breast cancer and lung cancer is around 0.5 - 0.6.

      It is true that using monotherapy drug responses can enhance the performance of TAIJI as described in its original paper. In fact, TAIJI builds separate prediction modules for molecular data and monotherapy drug-response data, then combine their results to obtain the final prediction. In our paper we prioritize the exploration of molecular mechanisms in drug combinations while achieving performance comparable to the molecular model of TAIJI. DIPx can be expected to achieve similarly improved performance if we integrate the monotherapy drug response data using the same approach.

      My major concerns were listed in the public review. Here are some writing issues:

      (5) Some content in the Results section looks like a discussion: i.e, L129, "The extra information from the use of monotherapy data in TAJI is rather small, approximately 10% increase in the overall Spearman correlation, and, of course, we could also use such data in DIPx, so it is more convenient and informative to focus the comparisons on prediction based on molecular data alone."; L257, "As we discuss above, to get synergy, the two drugs in a combination theoretically should not have the same target. However, there is of course no guarantee that two drugs that do not share target genes can produce synergy. ".

      We have revised the texts and moved them to the Discussion section.  

      Reviewer #2 (Public Review):

      Trac, Huang, et al used the AZ Drug Combination Prediction DREAM challenge data to make a new random forest-based model for drug synergy. They make comparisons to the winning method and also show that their model has some predictive capacity for a completely different dataset. They highlight the ability of the model to be interpretable in terms of pathway and target interactions for synergistic effects. While the authors address an important question, more rigor is required to understand the full behavior of the model.

      We thank the reviewer for his/her time and effort in carefully reading the manuscript and acknowledging the significance of the study.

      Major Points

      (1) The authors compare DIPx to the winning method of the DREAm challenge, TAJI to show that from molecular features alone they retrain TAJI to create TAJI-M without the monotherapy data inputs. They mention that "of course, we could also use such data in DIPx...", but they never show the behaviour of DIPx with these data. The authors need to demonstrate that this statement holds true or else compare it to the full TAJI.

      This is similar to point 4 raised by Reviewer 1 regarding the exclusive use of molecular data in DIPx. In fact, TAIJI uses separate prediction modules for molecular data and drugresponse data which are then combined to obtain the final results. While integrating monotherapy drug data could enhance DIPx’s overall performance, for example, simply replacing TAIJI’s molecular model with DIPx in the full TAIJI to achieve comparable results, this is not the primary goal of DIPx. Our focus is on exploring the potential molecular mechanisms of drug action. Using only molecular data allows for more convenient and intuitive inference of pathway importance compared to integrating multiple data types.

      We have revised the related text with the discussion in section “Validation and comparisons in the AZS dataset” of the main text.

      (2) It would be neat to see how the DIPx feature importance changes with monotherapy input. For most realistic scenarios in which these models are used robust monotherapy data do exist.

      Indeed, some existing models incorporate monotherapy data into their predictions; for example, a recent study [PMID: 33203866] uses only monotherapy data to predict drug combinations. TAIJI, as discussed in Point 1, uses separate models for monotherapy and molecular data. In general, both data types can be integrated into a single prediction model, allowing for the consideration of feature importance from both. While such an approach can highlight features contributing to predictive performance, the significance of a monotherapy feature does not necessarily indicate the activated pathways of a synergistic drug combination, which is the primary focus of our study. For this reason, we have excluded monotherapy data from DIPx.

      (3) In Figure 2, the authors compare DIPx and TAJI-M on various test sets. If I understood correctly, they also bootstrapped the training set with n=100 and reported all the model variants in many of the comparisons. While this is a nice way of showing model robustness, calculating p-values with bootstrapped data does not make sense in my opinion as by increasing the value of n, one can make the p-value arbitrarily small.

      The p-value should only be reported for the original models.

      The reviewer is correct that we cannot compute the p-value by using an independent twosample test, because the bootstrap correlation values are based on the same data. However, p-values can still be computed to compare the two prediction models using the bootstrap. Theoretically, the bootstrap can be used to compute a confidence interval for the differential correlation in the test set. However, there is a close relationship between p-values and confidence intervals (see Pawitan, 2001, chapter 5; particularly p.134). Specifically, in this case, we compute the p-value as follows: (1) For each bootstrap, (i) compute the Spearman correlation between the predicted and observed scores in the test set for DIPx and TAIJI-M.

      Denote this by r1 and r2. (ii) compute the difference in the Spearman correlations d= (r1-r2). (2). Repeat the bootstrap n=100 times. (3). Compute the minimum of these two proportions:

      proportion of d<0 or proportion of d>0. (4). The two-sided p-value = 2x the minimum proportion in (3). To overcome the limited bootstrap sample size, we use the normal approximation in computing the proportions in (3). Note that in this method of computing the p-value, larger numbers of bootstrap replicates do not produce more significant results.

      We have re-computed the p-values using this method and added this text to the ‘Methods and Materials’ Section. 

      (4) From Figures 2 and 3, it appears DIPx is overfit on the training set with large gaps in Spearman correlations between Test Set 2/ONeil set and Test Set 1. It also features much better in cases where it has seen both compounds. Could the authors also compare TAJI on the ONeil dataset to show if it is as much overfit?

      The poor performance in ONeil dataset is not due to overfitting as such, but more likely due structural differences between the training and ONeil datasets.  (To investigate the overfitting issue, we have conducted a 10-fold cross validation in the AZS training set. The median correlation between the predicted and observed Loewe score across ten folds is 0.48, which is comparable to the median of 0.50 in the Test Set 1. Therefore, the model does not suffer from overfitting issue.  We have added this cross-validation result in the Section “Validation and Comparisons in the AZS Dataset” (page 4)).

      We have now obtained TAIJI’s results on the ONeil dataset. TAIJI-M relies on a gene-gene interaction network to integrate the indirect drug targeting effects. This approach limits its applicability to new datasets, as it can only predict synergy scores for drug combinations present in the training dataset. Among the set of drug combinations present in the training set (n = 1102), both DIPx and TAIJI-M perform poorly, with Spearman correlations between predicted and observed synergy scores of 0.09 and 0.05, respectively.

      (Additional note: The original version of TAIJI-M uses gene expression, CNV, mutation, and methylation data. However, there is no methylation data in the ONeil dataset, so we retrained TAIJI-M without the methylation features. According to the final report of TAIJI in the challenge (https://www.synapse.org/Synapse:syn5614689/wiki/396206), Guan et al. reported that methylation features do not contribute to prediction performance in the postchallenge analysis. This means that retraining TAIJI-M without the methylation data will not materially affect the comparison between DIPx and TAIJI-M on the ONeil dataset.)

      Minor Points:

      (5) Pg 4, line 130: Citation needed for 10% contribution of monotherapy.

      (6) The general language of this paper is informal at times. I request the authors to refine it a bit.

      We thank the reviewer for pointing this out. We have added the appropriate citation for the statement and carefully revised the text to make it more formal.

      Reviewer #3 (Public Review):

      Summary:

      Predicting how two different drugs act together by looking at their specific gene targets and pathways is crucial for understanding the biological significance of drug combinations. Such combinations of drugs can lead to synergistic effects that enhance drug efficacy and decrease resistance. This study incorporates drug-specific pathway activation scores (PASs) to estimate synergy scores as one of the key advancements for synergy prediction. The new algorithm, Drug synergy Interaction Prediction (DIPx), developed in this study, uses gene expression, mutation profiles, and drug synergy data to train the model and predict synergy between two drugs and suggests the best combinations based on their functional relevance on the mechanism of action. Comprehensive validations using two different datasets and comparing them with another best-performing algorithm highlight the potential of its capabilities and broader applications. However, the study would benefit from including experimental validation of some predicted drug combinations to enhance its reliability.

      Strengths:

      The DIPx algorithm demonstrates the strengths listed below in its approach for personalized drug synergy prediction. One of its strengths lies in its utilization of biologically motivated cancer-specific (driver genes-based) and drug-specific (target genes-based) pathway activation scores (PASs) to predict drug synergy. This approach integrates gene expression, mutation profiles, and drug synergy data to capture information about the functional interactions between drug targets, thereby providing a potential biological explanation for the synergistic effects of combined drugs. Additionally, DIPx's performance was tested using the AstraZeneca-Sanger (AZS) DREAM Challenge dataset, especially in Test Set 1, where the Spearman correlation coefficient between predicted and observed drug synergy was 0.50 (95% CI: 0.470.53). This demonstrates the algorithm's effectiveness in handling combinations already in the training set. Furthermore, DIPx's ability to handle novel combinations, as evidenced by its performance in Test Set 2, indicates its potential for extrapolating predictions to new and untested drug combinations. This suggests that the algorithm can adapt to and make accurate predictions for previously unencountered combinations, which is crucial for its practical application in personalized medicine. Overall, DIPx's integration of pathway activation scores and its performance in predicting drug synergy for known and novel combinations underscore its potential as a valuable tool for personalized prediction of drug synergy and exploration of activated pathways related to the effects of combined drugs.

      Weaknesses:

      While the DIPx algorithm shows promise in predicting drug synergy based on pathway activation scores, it's essential to consider its limitations. One limitation is that the algorithm's performance was less accurate when predicting drug synergy for combinations absent from the training set. This suggests that its predictive capability may be influenced by the availability of training data for specific drug combinations. Additionally, further testing and validation across different datasets (more than the current two datasets) would be necessary to assess the algorithm's generalizability and robustness fully. It's also important to consider potential biases in the training data and ensure that DIPx predictions are validated through empirical studies including experimental testing of predicted combinations. Despite these limitations, DIPx represents a valuable step towards personalized prediction of drug synergy and warrants continued investigation and improvement. It would benefit if the algorithm's limitations are described with some examples and suggest future advancement steps.

      We are grateful to the reviewer for the thoughtful and encouraging comments, and for the time and effort to read our manuscript. We have carefully addressed them in our revision.

      Reviewer #3 (Recommendations For The Authors):

      The authors could consider some of the recommendations below to further improve the DIPx algorithm and its application in personalized drug synergy prediction. Firstly, expanding the training dataset to include a broader range of drug combinations could improve the algorithm's predictive capabilities, especially for novel combinations. This would help address the observed decrease in performance when predicting drug synergy for combinations absent from the training set. This could help assess the robustness of the algorithm and provide a more comprehensive evaluation of its performance for untrained combinations to strengthen its application.

      We agree that expanding the training dataset with a broader range of drug combinations would likely improve performance. However, the vast number of possible combinations, along with the associated cost of the experiment, limits the availability of drug combination data. To increase the size of the training data, we could combine different studies, but data from different studies are often generated using different protocols and experimental settings, introducing biases that complicate the integration. As technology continues to advance, we anticipate that more standardized and comprehensive data will become available in the future, which will help address this issue.

      Furthermore, the authors may consider incorporating additional features or data sources, such as drug-specific characteristics, i.e., availability of the drug, to enrich the information utilized by the algorithm. This could potentially improve the accuracy of the predictions and provide a more holistic understanding of the factors contributing to drug synergy.

      Indeed, incorporating additional information such as monotherapy data and drug-specific characteristics, as in TAIJI’s approach, could enhance overall prediction performance. As discussed in Point 5 below, the current study is focused on exploring the potential molecular mechanisms of drug combinations, rather than optimizing overall prediction accuracy. However, in its application, it is natural to add the monotherapy or drug-specific information into the algorithm, as done in TAIJI.

      Finally, conducting experimental studies to validate the predictions generated by DIPx in laboratory-based cell lines would be essential to confirm its accuracy and reliability. This could involve a few drug IC50 experimental validations of predicted synergistic drug combinations and their associated pathway activations to strengthen the algorithm's clinical relevance. By considering these recommendations, the authors can further refine and advance the DIPx algorithm.

      We agree that laboratory-based validation, such as IC50 experiments for predicted synergistic drug combinations and pathway activations, would indeed strengthen the clinical relevance of the algorithm. We hope future studies can build on this work by incorporating this experimental validation.

      Below are my specific comments:

      Major comments:

      (1) The description of all the outputs of the DIPX algorithm is not clearly explained. It is unclear whether it provides only the Loewe score, the confidence score, the PAS score, or all of them. It is necessary to clarify the output of the proposed algorithm to guide the reader on what to expect while using it. The steps from PASs to synergy scores are not well explained.

      We apologize for the lack of clarity. Regarding the outputs of DIPx, for any triplet (drug A + drug B, cell line C), DIPx provides both the predicted Loewe score and the corresponding confidence score as the output. PASs are used as the input data for the random forest algorithm, which processes PASs into the synergy score. We do not provide the details in the manuscript, but refer to the article by Ishwaran H et al., (2021). We have revised the first paragraph of the 'A Pathway-Based Drug Synergy Prediction Model' section (page 3) and Figure 1 to improve the presentation of the method.

      (2) In Figure 1, the predicted Loewe score for the Capivasertib + Sapitinib combination is not provided. However, Figures 1e and 4a show the pathways with the highest contribution for this combination. What is the predicted Loewe score for the Capivasertib + Sapitinib combination?

      Figures 1e and 4a presents the pathways with the highest contribution for the combination which are identified based on the drug-combination data from 12 cell lines, not a single data point.

      We have added the median Loewe score (=7.6) across 12 cell lines in the test sets (Test 1 + Test 2) for the Capivasertib + Sapitinib combination in Figure 1e and reported related information for this combination in Supplementary Table S1. Additionally, we revised the 'Inference of the Mechanism of Action Based on PAS' section (page 7) to clarify the pathway importance inference.

      (3) In Figure 1d, the combination of doxorubicin + AZ12623380 is predicted to exhibit high Loewe synergy, with a confidence score of 0.33. It is important to provide details of this prediction, including the pathway predictions, and to explain why the model suggested high synergy. Although Figure 4f contains information, it seems to be listed for the observed Loewe score rather than the predicted score provided in Figure 1d. DIPx predicts the doxorubicin + AZ12623380 combination to be synergistic, while in Figure 4, it is labeled as a non-synergistic combination. It is necessary for the authors to clearly indicate which illustration represents the predicted outcome and which hypothesis is based on the observed Loewe score.

      In Figure 1d, we reported both predicted and observed Loewe score for the experiment (combination = doxorubicin + AZ12623380, cell line = SW900). Although the predicted score is high, a confidence score of 0.33 indicates that there is a low chance of the prediction is synergistic. And this is indeed confirmed by the non-synergistic observed score of -6, so it does not merit further investigation. This example highlights the value of the confidence score to supplement the predicted values. 

      (4) Figure 3 - The external validation using ONeil requires more rigorous analysis to understand the biological significance of the predictions. It is important to provide pathway activation scores and their potential mechanism of action predicted by the DIPx algorithm when working with a new dataset. Additionally, including the predictions of TAIJI-M on the ONeil dataset would be beneficial for comparing the performance of both algorithms on a new dataset.

      We have included an example of potential pathways related to the MK2206 + Erlotinib combination in the ONeil cohort, as inferred by DIPx, in the last paragraph of the 'Inference of the Mechanism of Action Based on PAS' section (page 9). In this example, we identify 'Metabolism by CYP Enzymes' as the most significant pathway associated with this combination, which aligns with previous studies that both MK2206 and Erlotinib are metabolized by the CYP enzyme families [PMID: 24387695].

      Regarding the prediction of TAIJI-M on the ONeil dataset, we have a similar request in question 4 from Reviewer 2, which we have carefully addressed above. Briefly, due to differences between two datasets, we retrained TAIJI-M without methylation data to enable prediction on the ONeil dataset. (As previously reported, methylation data did not significantly contribute to the results of TAIJI, and TAIJI-M can only predict synergy scores for drug combinations present in the training set.) Focusing on this subset of drug combinations, both TAIJI-M and DIPx perform poorly, with Spearman correlations of r=0.05 and r=0.09, respectively. The poor performance could be attributed to the limited overlap of drugs between the ONeil dataset and the AZS DREAM Challenge dataset.

      (5) TAIJI by Li et al., 2018 reported a high prediction correlation (0.53) in their study, while the modified version of TAIJI, TAJI-M, shows a lower prediction correlation in this study. The authors should clarify why the performance decreased when using the same dataset. Is it because only molecular data was used, excluding the monotherapy drug-response data? There is a spelling error in calling the algorithm - it is reported as TAIJI by Li et al., 2018, whereas this study calls it TAJI - an "I" is missing in TAIJI throughout the manuscript.

      Indeed, TAIJI-M has a lower prediction correlation (0.38) compared to the full TAIJI model (0.53), which includes the monotherapy data. Some studies such as [PMID: 33203866] even use only monotherapy data in prediction of drug combinations, suggesting the importance of monotherapy data in the drug-combination prediction. However, DIPx focuses on exploration of potential molecular mechanisms of drug combinations rather than overall prediction results, therefore, we exclude the monotherapy data from analysis. We have discussed on this in the 'Validation and Comparisons in the AZS Dataset' section (page 4).

      We thank the reviewer for pointing the spelling error for TAIJI; this has been corrected throughout the manuscript.

      (6) The authors should provide the predicted versus observed Loewe scores for all the combinations as a supplementary file. This would benefit the readers who want to replicate the results in the future. In the same way, including a sample output for the toy dataset on GitHub is required to assess the performance of the DIPx algorithm by a new user.

      All predicted and observed drug synergy scores are given in Supplementary Table S2. We also have already uploaded a simple example on our GitHub page, along with detailed instructions for users on how to run the method, including generating PAS and training the prediction model. Since we do not have permission to host data from the AZS DREAM Challenge and the ONeil datasets on our GitHub page, users can download these datasets separately and directly apply the provided code.

      (7) GitHub can include all the input and output data to reproduce the correlation plots in the manuscript. GitHub could also include the modified version of TAIJI-M and its corresponding input for comparison. The methods section should include how TAIJI was performed.

      We have uploaded all the codes and related data to the GitHub page to allow replication of all correlation plots in the manuscript. TAIJI-M represents the molecular model of the full TAIJI model. Both TAIJI-M and TAIJI are documented on the GitHub page of the original study. We have also included a link to the source code for TAIJI-M and TAIJI in the 'Data Availability' section.

      (8) Figure 5 - the data associated with this figure needs to be provided as supplementary listing the predicted values of Loewe scores for all the combinations.

      We report the associated data including the median of predicted and observed Loewe scores related to Figure 5c in Supplementary Table S2.

      Minor comments:

      (9) Abbreviations for the pathways are not included.

      We have included a list of abbreviations for all relevant pathways in Supplementary Table S5.

      (10) Line: 369. What is considered as bias correction? This needs to be explained.

      Bias correction refers to adjusting the original estimate of the Spearman correlation between the predicted and observed Loewe scores when there is a systematic difference between the estimates obtained from the bootstrap samples and the original correlation estimate. We revised the related text in page 13 to improve the explanation.

      (11) Line 364. Formulae or details for calculating actual predicted synergy (Ps) are missing.

      The predicted Loewe score, Ps, is the output of the regression random forest model. For simplicity, we do not describe the details in the manuscript, but refer to the description of the method article (Ishwaran H et al., 2021). We have revised the text accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate that both reviewers found our findings significant and recognized the strength of the presented data in demonstrating the potential value of ASO-mediated Emc10 expression modulation for treating 22q11.2DS. We are grateful for the reviewers' valuable input and constructive suggestions, which we believe have significantly strengthened our manuscript. Below, we address the main points and concerns, followed by our point-by-point responses:

      Evaluation of ASO-Mediated Emc10 Reduction: We appreciate the feedback and the opportunity to clarify this point. While we agree that ASO-mediated reduction of Emc10 should ideally be evaluated at both the mRNA and protein levels, we would like to emphasize that this was indeed performed in our study. Specifically, we conducted both qRT-PCR and Western Blot (WB) assays on the same animal cohort, focusing on the left and right hippocampus (rather than the PFC) following ASO injection (see Figure S11C and D). We prioritized the hippocampus for the WB assay because our primary behavioral assays and observed phenotypes in this study are strongly hippocampus-centric. This approach reflects our aim to investigate Emc10's role in the brain regions most relevant to the observed phenotypes. We hope this clarification addresses the reviewer’s concerns. While protein-level analysis would ideally complement RNA measurements, the Emc10 antibodies available were suboptimal in specificity and sensitivity, requiring substantial optimization. Additionally, challenges in obtaining sufficient high-quality protein from small regions like the hippocampus limited the use of protein detection as a standalone method. We plan to refine antibody protocols or explore alternative methods in future work. Notably, in all instances where we performed parallel protein and RNA measurements in both, mouse brain tissue and human cell lines, there was excellent concordance between the datasets, strongly suggesting that mRNA levels are a reliable indicator of Emc10 protein levels in our model.

      ASO Neuronal Uptake: While ASO uptake by neurons in the brain can vary considerably depending on factors such as ASO chemistry, delivery method, target brain region, and cell type, our targeted delivery approach, ASO design optimization, and ASO screening strategy were specifically tailored to achieve uniform and efficient uptake across hippocampal and cortical regions, in both neurons and glia. The figures included in our manuscript at both low and high magnification (see Figure S14A) clearly display the extensive (over 97%) overlap of ASO-positive cells (green signal) with cells expressing the neuronal marker NeuN (red signal). While quantifying ASO-positive cells in different brain regions could add value, the robust diffusion of ASO into neurons and glia is effectively demonstrated in the current figures and indirectly supported by the robust downregulation of Emc10 in ASO-treated animals as shown by qRT-PCR assays of hippocampal and cortical brain regions.

      Transcriptomic Data in Mutant EMC10 NGN2-iNs: Reduction in EMC10 levels is not expected to directly affect transcription or to broadly reorganize the differential gene expression profile of the Q6/Q5 patient/control NGN2-iN lines. Accordingly, our transcriptional profiling was not designed to assess the direct impact of EMC10 deficiency on gene expression but rather to serve as an indirect measure of cellular pathways affected by the reduction in EMC10 levels in the patient Q6 line. We aimed to identify genes and related functional pathways differentially expressed between the Q6/Q5 patient/control lines, where these expression differences are either abolished or significantly attenuated in Q6/EMC10<sup>HET</sup> or Q6/EMC10<sup>HOM</sup> NGN2-iNs.

      Statistical Analysis: We have meticulously reviewed all statistical analyses in the manuscript to ensure their appropriateness and adherence to established practices. For Figure S2, we acknowledge that the statistical details were not fully specified in the figure legend, though they are provided for each miRNA in Supplemental Table S2. In the revised manuscript, we ensured that the statistical methods and corresponding values are clearly indicated for each comparison.

      We are confident that the revisions outlined above, along with the point-by-point responses provided below, will significantly strengthen our manuscript and address all the concerns raised by the reviewers. We would like to express our sincere thanks to the reviewers for their valuable feedback and constructive suggestions.

      Reviewer #1 (Recommendations For The Authors):

      My comments here are generally limited to minor comments that reflect possible small additions or edits to the manuscript:

      (1) Panel 1A is very small. Please consider making that bigger as space permits.

      We have increased the panel size of Figure 1A in the revised manuscript to improve its visibility and clarity.

      (2) Are you able to identify the dot that represents EMC10 in panel 1C? I understand that EMC10 is represented in Supplementary Figure 4A.

      We appreciate the reviewer's observation. In Figure 1C, the volcano plot depicts differentially expressed miRNAs in the Q5/Q6 neuronal samples, as identified through miRNA-sequencing. Since EMC10, as a protein-coding gene and a downstream target of miRNA dysregulation, is not included in this analysis. However, as the reviewer correctly notes, EMC10 gene expression is represented in the volcano plot in Supplementary Figure 4A, which displays differentially expressed genes identified through bulk RNA-seq analysis of the same neuronal samples. To avoid any confusion, we have clarified the title of Figure 1C to emphasize that it represents miRNA expression changes.

      (3) With regard to studies using iPSC. Some of the studies are executed across multiple distinct pairs and some are only done in a single pair. Overall, while results are coherent and often complimentary, would it be valuable for the authors to comment on experiments where studies in multiple pairs seemed particularly important, or others wherein it was less important?

      We thank the reviewer for this insightful question regarding our use of multiple versus single hiPSC pairs. Our investigation began with the Q5/Q6 sibling (dizygotic twin) pair, which shares the most similar genetic background. This minimized the impact of confounding genetic factors and provided a robust foundation for testing our hypothesis that EMC10 upregulation, driven by miRNA dysregulation, is a key consequence of the 22q11.2 deletion in human neurons, thus validating our previous findings from the Df(16)A<sup>+/-</sup> mouse model (Stark et al., 2008; Xu et al., 2013). To ensure the generalizability of our findings, we incorporated additional hiPSC lines from another sibling pair as well as a case/control pair, demonstrating that EMC10 upregulation is a consistent feature of 22q11.2DS. Subsequently, we focused on the well-matched Q5/Q6 pair for detailed morphological, functional, and genetic rescue experiments. This approach allowed us to perform in-depth studies while controlling for potential genetic confounders. By using both multiple and single hiPSC pairs, we balanced the need for generalizable findings with the practical considerations of conducting technically complex and resource-intensive experiments. This strategy enabled us to provide both broad and detailed insights into the mechanisms underlying 22q11.2DS. We have modified the introductory paragraph of the Results section to better highlight this issue.

      (4) While the majority of the experiments seem sufficiently powered to test the hypothesis in question in the iPSC studies, Figure 2B raises the question if the study replicates here were underpowered, and perhaps the authors might consider mentioning this, although this is a very minor comment.

      We thank the reviewer for raising this point. We acknowledge that the statistical power to detect a significant difference in pre-miR-485 levels in Figure 2B may be limited due to the relatively small sample size and the inherent variability in hiPSC-derived neuronal cultures. However, it is important to emphasize that the functional impact of miRNAs is primarily mediated by their mature transcript forms. Our miRNA-seq data (Supplementary Table 2 and Figure S2) did not show significant alterations in the levels of mature miR-485-5p or miR-485-3p. This finding aligns with the reported expression pattern of miR-485 in hiPSC-derived neurons, where relatively low levels are observed in early neuronal development, with increased expression occurring in older, more mature neurons (Soutschek et al. 2023; https://ethz-ins.org/igNeuronsTimeCourse/ database from the Institute of Neurogenomics, ETH Zurich). This database provides a valuable resource for examining gene expression dynamics during human neuronal differentiation. Given that our hiPSC-derived neurons were analyzed at a relatively early developmental stage (DIV8 for these experiments), it is likely that miR-485 expression had not yet reached levels sufficient to reveal significant differences. While we acknowledge the potential limitation in statistical power for detecting subtle changes in pre-miR-485 levels, the combined evidence suggests that miR-485 may not be a significant contributor to the observed phenotypes at this developmental stage.

      A paragraph has been added in the corresponding Results section to address this issue.

      (5) There are a few situations where the authors could help out the reader a little bit by providing more labels on the figures directly. For example: in Figure 2, there are expression levels, over-expression, and inhibition of miRNA but the X-axis is named with similar labels for the miRNAs in question for each of these distinct experiments. If the authors want to help the reader, they may consider labeling these panels with a descriptive title to reflect the experiment being done or use more descriptive terms in the X-axis panels. Again, this is minor. Similarly, in Figure 5, it might be helpful for the authors to help out the reader again with more labels on the panels, such as in Figures 5B, 5C, and 5D. Would they consider labeling these panels, HPC, PFC, SSC with the brain location as they did in Figure 4?

      We thank the reviewer for these helpful suggestions to improve the clarity of our figures. We have implemented the proposed changes. In Figure 2C-E, we have added specific titles to the panels to clearly distinguish between the different experimental conditions such as miRNA overexpression and inhibition. Similarly, in Figure 5, we labeled panels 5B, 5C, and 5D with the brain regions analyzed (HPC, PFC, SSC) to match the labeling used in Figure 4. We believe these revisions enhance the readability and overall interpretability of the figures, making it easier for readers to follow the experiments and results.

      (6) Figure 3: There is some overshoot of the data in EMC10 homozygous null, in panel 3E, and also, overshoot of the het in panel 3H. Would there be value in the authors commenting on the potential basis for this in the discussion? Some issues are minor, such as the lack of electrophysiological analysis of circuits in vivo or in ex vivo slices that may further support the proposed rescue.

      The reviewer correctly highlights the observation in Figures 3E and 3H, where the number of branch points in the Q6/EMC10<sup>HOM</sup> line exceeds wildtype levels and the calcium response in the Q6/EMC10<sup>HET</sup> and Q6/EMC10<sup>HOM</sup> lines surpasses that of the control. This overshoot is indeed intriguing and warrants discussion. EMC10 is part of the ER Membrane Complex (EMC), which plays a critical role in the proper folding and localization of various membrane proteins, including neurotransmitter receptors and ion channels such as voltage-gated calcium channels (Chitwood et al., 2018; Shurtleff et al., 2018; Chitwood and Hegde, 2019). In the context of the 22q11.2 deletion, EMC10 dysregulation may disrupt the proper localization of these proteins at the synapse, affecting both dendritic morphology and calcium signaling. The precise basis of this overshoot remains unclear. The overshoot may result from a dosage-sensitive inhibitory effect of Emc10, where both reduced and increased expression alter normal neuronal processes, with excessive responses potentially triggered upon gene restoration by the mutant system’s adaptation to dysfunction, leading to altered receptor sensitivity or signaling dynamics. This underscores the critical importance of precise Emc10 expression for proper neuronal development and function, in line with previous findings suggesting that EMC10 plays an auxiliary or modulatory role in EMC function. A short comment on the potential basis for this overshoot has been added in the corresponding Results section of the manuscript. Regardless of the underlying mechanisms, these findings emphasize the importance of precise titration of ASO constructs, rigorous gene dosage controls, and thorough analysis of context-specific responses to ensure both efficacy and safety in clinical applications.

      We also agree with the reviewer that electrophysiological studies, particularly in the 22q11.2 deletion mouse model, would provide valuable insights into the impact of EMC10 modulation by ASOs on neuronal activity and circuit function at the in vivo and ex vivo levels. Incorporating such experiments into future studies will allow us to assess synaptic transmission and plasticity, contributing to a more comprehensive understanding of the therapeutic potential of ASO-mediated EMC10 modulation in 22q11.2DS.

      (7) Did the authors take out the behavior studies further than 9 weeks? Would the authors consider commenting on what they speculate might be the duration of the treatment effect? For both mice and definitely humans.

      We thank the reviewer for raising the important question regarding the duration of the ASO treatment effect, which is crucial for translating our findings into clinically relevant therapies. While behavioral studies beyond 9 weeks were not conducted in this study, our in vivo experiments and findings from prior publications (detailed below) enable an informed speculative assessment.

      We utilized 2'-O-methoxyethyl (2'-MOE) modified ASOs, known for their enhanced binding affinity, nuclease resistance, and increased metabolic stability. In our in vivo post-injection screening of ASOs (Figure S13C), we predicted that Emc10 expression levels return to normal WT levels (~T100%) approximately 26 weeks post-treatment in Emc10<sup>ASO</sup> (#1466182) treated mice. This prediction is supported by our Emc10 expression profiles across various brain regions, which demonstrate robust repression of Emc10 lasting up to 10 weeks post-administration (Figure 6D-F). While these findings suggest that the treatment effect in our model could extend significantly beyond 10 weeks following a single ASO injection, further empirical validation is required through extended follow-up studies. Encouragingly, long-term effects of 2'-MOE ASOs have been observed in other neurological disorders (Kordasiewicz et al., 2012; Scoles et al., 2017; Finkel et al., 2017; Darras et al., 2019). However, factors such as ASO distribution, target cell turnover, and disease-specific pathophysiology could influence the duration of the effect. To address these uncertainties, we have added a paragraph in the Discussion section emphasizing the need for additional studies, including extended follow-up periods and eventual clinical trials, to determine the specific duration of effect for our Emc10<sup>ASO</sup> constructs in treating 22q11.2DS.

      Reviewer #2 (Recommendations For The Authors):

      (1) It is acknowledged that the iPSC-derived cells in Figure 1 are no longer progenitors, but differentiation markers for astrocytes and glia are also needed in Figure 1b to establish that equal rates of differentiation have occurred across genotypes.

      We thank the reviewer for raising this important point about ensuring equal rates of differentiation across genotypes. As the reviewer notes, we employed a well-established protocol for directed differentiation of hiPSCs into cortical neurons using a combination of small molecule inhibitors, as previously described by Qi et al. (2017). This protocol has been extensively validated and is known to robustly generate cortical neurons while actively suppressing glial differentiation, as evidenced by the lack of upregulation of glial markers such as GFAP, AQP4, or OLIG2 in the original study. Given the established neuronal specificity of this protocol and our focus on neuronal phenotypes, we prioritized the confirmation of successful neuronal differentiation using the established neuronal markers TUJ1 and TBR1. Therefore, additional markers for astrocytes and glia are not included in this figure, as we did not expect significant glial differentiation under these conditions. A sentence has been added in the corresponding Results section to address this issue.

      (2) For the RNA-seq experiments outlined in Figures 3J and K, a more comprehensive analysis is needed of the genes disrupted in the parental Q6 line relative to the het and homo lines. What percent are rescued, unaffected, vs uniquely disrupted?

      Reduction in EMC10 levels is not expected to directly affect transcription or broadly reorganize the gene expression profile of the Q6/Q5 NGN2-iN lines. Our transcriptional profiling was not designed to assess the direct impact of EMC10 deficiency on gene expression but rather to measure the cellular pathways affected by reduced EMC10 in the patient Q6 line. We identified genes differentially expressed between the Q6 (patient) and Q5 (control) lines, whose expression differences were either abolished or significantly attenuated ("rescued") in the Q6/EMC10<sup>HET</sup> or Q6/EMC10<sup>HOM</sup> lines. In the Q6/EMC10<sup>HET</sup> line, 237 DEGs (6%) were rescued, while in the Q6/EMC10<sup>HOM</sup> line, 382 DEGs (11%) were rescued. Importantly, further analysis revealed 103 shared rescued DEGs in these lines, which was statistically significant (enrichment factor = 1.7; p < 0.0001, based on a hypergeometric test). We added a new figure panel (Figure 3L) to visualize the significant overlap of rescued DEGs from the Q6/EMC10<sup>HET</sup> and Q6/EMC10<sup>HOM</sup> lines. This overlap suggests these genes play a critical role in biological pathways impacted by EMC10 levels, particularly in nervous system development, as indicated by our functional annotation analysis. We also performed protein-protein interaction (PPI) network analysis to explore the functional relationships among these 103 shared DEGs (Figure S8). Future studies will further investigate these gene sets to gain deeper insights into the molecular mechanisms underlying 22q11.2DS and the role of EMC10.

      (3) The authors claim that 50% EMC10 loss in adult mice is safe and should be toned down. EMC10 knockout mice have motor, anxiety, and social phenotypes. It would be unique amongst highly dosage-sensitive genes (MeCP2, CDKL5, TCF4, FMR1, etc.) for there to only be a neurodevelopmental component. In all those cases, and others, the effects of over and under-expression are reversible into adulthood. Establishing the range in adults is critical to establishing therapeutic utility. Absent a detailed examination of non-cognitive phenotypes, this claim cannot be made.

      The reviewer raises an important point about the potential effects of EMC10 reduction in adult mice and the need to establish a safe therapeutic window by evaluating both cognitive and non-cognitive phenotypes. We agree that such a comprehensive evaluation is critical for assessing the safety and translational potential of Emc10-targeting therapies. While the International Mouse Genotyping Consortium reported motor and anxiety phenotypes in homozygous Emc10 knockout mice, these data are unpublished and based on a relatively small number of animals. Furthermore, in our previous work (Diamantopoulou et al., 2017), we demonstrated that complete Emc10 loss does not impair cognition or social behavior, as assessed by prepulse inhibition (PPI), working memory (WM), and social memory (SM) assays (see Figure 3A-D; Diamantopoulou et al., 2017). Additionally, heterozygous Emc10 mice, which exhibit a ~50% reduction in Emc10 expression similar to that achieved with our ASO treatment, showed no evidence of motor deficits or anxiety-like behavior. Specifically, Emc10<sup>+/-</sup> mice displayed locomotor activity comparable to WT mice in the open field (OF) test (Figure S4A, Diamantopoulou et al., 2017). Moreover, genetic normalization of Emc10 expression in Df(16)A<sup>+/-</sup> mice demonstrated no signs of anxiety-like behavior, as assessed by the OF test (Figure S4A) and elevated plus maze (EPM) (Figure S4B; Diamantopoulou et al., 2017). To further support these findings, we have added new data to the current manuscript (see Figure S10J) showing that TAM treatment-mediated restoration of Emc10 levels in the brain of adult Df(16)A<sup>+/-</sup> mice did not affect the time that mutant mice spent in the center area of the OF (Fig. S10J), suggesting that Emc10 reduction does not influence anxiety-related behavior. These results suggest that a 50% reduction in EMC10 expression is unlikely to result in motor or anxiety-like phenotypes in adult mice. Finally, as noted in the manuscript, in addition to prior findings from animal models, a substantial number of relatively rare LoF variants or potentially damaging missense variants have been identified in the human EMC10 gene among likely healthy individuals in gnomAD, a database largely devoid of individuals known to be affected by severe neurodevelopmental disorders (NDDs).

      Nevertheless, the Discussion has been revised to underscore the importance of establishing a more detailed safety profile, including non-cognitive phenotypes, to fully validate the therapeutic potential of Emc10-targeting approaches. It also highlights the need for future studies to expand on these evaluations, addressing this critical aspect and laying a stronger foundation for advancing these findings into clinical drug development

      (4) Supplemental Figure 10: The protein validation of Emc10 knockout following tamoxifen injection needs to be validated in all brain regions, not just the PFC. This is particularly important as the rest of the paper focuses on HPC-mediated phenotypes.

      First, we want to emphasize that we conducted both qRT-PCR and WB assays on the same animal cohort, specifically examining the left and right hippocampus following ASO injection (see Figure S11C and D). This approach is crucial, given the central role of hippocampus in the phenotypes investigated in our ASO-mediated Emc10 knockdown experiments.

      The reviewer raises an important point regarding the validation of EMC10 reduction at the protein level across all relevant brain regions using the Emc10 conditional knockout strain. We agree that such validation would ideally confirm the efficacy of our tamoxifen-induced knockout model comprehensively. However, we hope the reviewer appreciates that obtaining sufficient high-quality protein for WB analysis from smaller brain regions like the hippocampus poses a significant technical challenge. This difficulty is further compounded by the need to reserve the same samples for qRT-PCR to ensure consistency between mRNA and protein measurements. Importantly, our data from ASO-mediated Emc10 knockdown experiments (Figures S11C-D) demonstrate a clear and consistent correlation between reductions in Emc10 mRNA and protein levels in both the left and right hippocampus. Furthermore, in our constitutive Emc10-knockout mouse model (Diamantopoulou et al., 2017; see Figure S1A-B), we observed a strong agreement between mRNA and protein levels, supporting the reliability of mRNA data as a proxy for EMC10 protein levels in our experiments. Importantly, in all instances where we performed parallel protein and RNA measurements in human cell lines, there was excellent concordance between the datasets. Thus, while we acknowledge the limitations of relying primarily on mRNA data, we are confident that the Emc10 mRNA expression data in Figure S10 accurately reflect protein-level changes across brain regions in our conditional knockout model. To address this concern more fully in the future, we are working to refine antibody detection and optimize our protein extraction protocols to enable more routine and precise protein-level validation across smaller brain regions. We appreciate the reviewer’s feedback and will continue to refine our methodologies to strengthen the robustness of our findings.

      (5) Figure 3: 1 way ANOVA would be more appropriate to analyze the data in B-G than t-tests.

      We appreciate the suggestion of the reviewer. As mentioned above, we carefully selected statistical tests appropriate for each analysis. For Figure 3B-G, we chose to use pairwise t-tests to address specific hypotheses regarding the disease phenotype and rescue effects. This approach is consistent with prior experimental studies in the field, including our own (e.g., Xu et al., 2013; Figure 7H-I). Importantly, most of our t-tests yielded highly significant results (p < 0.001 or p < 0.01), reinforcing the robustness of our findings.

      (6) Figure 5-6: Protein data is needed to complement the mRNA knockdown data.

      We agree with the reviewer on the importance of protein-level validation to complement the mRNA knockdown data. As mentioned in our response to Reviewer’s Comment (4), in all instances where we performed parallel protein and RNA measurements, either in mouse brain or human cell lines, we observed excellent concordance between the datasets. This supports the reliability of our mRNA data as a proxy for protein changes. Nevertheless, we acknowledge the value of including protein validation in future experiments and will consider incorporating it to further strengthen our findings.

      (7) The use of additional phenotypic measures is applauded in Figure 6, however, to appropriately interpret the data more is needed. Shao et al 2021 (Figure S9) show data from the International Mouse Genotyping Consortium claiming EMC10 KO mice have gait, activity, and anxiety phenotypes. All of these parameters could impact the SM assay and the y-maze assay. Changes in SM interaction time could be linked to anxiety or motor impairments, but interpreted as cognitive deficits because these symptoms were not assessed. At a minimum, discussion is needed about this limitation, as well as the inclusion of distance explored in the SM and Y-maze assays.

      We thank the reviewer for their insightful comment regarding the potential influence of locomotor, gait, or anxiety phenotypes on the observed deficits in the SM and Y-maze assays. The behavioral phenotypes reported for Emc10 knockout mice by the International Mouse Genotyping Consortium (https://www.mousephenotype.org/data/genes/MGI:1916933) were limited to homozygous female mice and based on a small sample size (4–6 females) compared to a larger WT control group. Moreover, these data are unpublished and thus challenging to evaluate fully. Importantly, no abnormal behaviors were reported for Emc10 heterozygous knockout mice in these datasets. Additionally, the claim by Shao et al. (2021) regarding cognitive impairments in Emc10 knockout mice based on our previous work (Diamantopoulou et al., 2017) is inaccurate.

      Our analysis of both the constitutive Emc10 knockout model (Diamantopoulou et al., 2017) and the current conditional Emc10 heterozygous knockout model consistently demonstrates that Emc10 reduction does not affect locomotor activity or anxiety-like behavior. In our earlier characterization of constitutive heterozygous Emc10 knockout mice (Emc10<sup>+/-</sup>), we observed no signs of anxiety-like behavior or motor impairments in OF assays (see Figure 2A-B and Figure S4A, Diamantopoulou et al., 2017). Similarly, results from Df(16)A<sup>+/-</sup> mice with genetically normalized Emc10 expression [Df(16)A<sup>+/-</sup>; Emc10<sup>+/-</sup>] also showed no indications of anxiety-like behavior or locomotor changes in the OF and EPM assays (see Figure S4A-B, Diamantopoulou et al., 2017). Consistent with these findings, our current data from Df(16)A<sup>+/-</sup> mice with conditional Emc10 reduction in the brain show no significant differences in locomotor activity and anxiety-related measures as assessed by OF assays (Figure S10J). Furthermore, total arm entries in Y-maze assays conducted in Df(16)A<sup>+/-</sup> mice treated with Emc10 ASOs were comparable to controls (Figures S14C and G-H), providing additional support for the conclusion that locomotor activity remains unaffected in these models.

      We further appreciate the reviewer’s suggestion that changes in social interaction time during the SM assay could be influenced by anxiety or motor impairments. However, we consider this scenario unlikely in our model. Interaction times during the first trial of the SM assay, which measures general social interest, are comparable between Df(16)A<sup>+/-</sup> mice with reduced Emc10 expression (either genetically or through ASO treatment) and WT controls (see Figures 4E, 5E, and S10G). These findings indicate that our mouse models do not exhibit inherent difficulties in initiating social interaction, as might be expected if motor impairments or heightened anxiety were present. Reduced social interaction is commonly used as a behavioral marker for anxiety in rodent studies (reviewed by Bailey and Crawley, Anxiety-Related Behaviors in Mice, 2009). “Anxious” mice typically exhibit decreased social interaction, spending less time engaging with other mice compared to non-anxious counterparts. However, the specific deficit we observe in the second trial of the SM assay—when mice are reintroduced to a familiar juvenile—is indicative of impaired social recognition memory, as previously documented for Df(16)A<sup>+/-</sup> mice (Piskorowski et al., 2016; Donegan et al., 2020). This deficit is distinct from the general social avoidance typically associated with heightened anxiety.

      Based on our comprehensive assessment of locomotor activity, anxiety-related behaviors, and social interaction, we conclude that the observed rescue of social memory and spatial memory deficits in mice with reduced Emc10 expression is most likely due to improved cognitive function rather than alterations in motor or anxiety-related domains.

      (8) For ASO optimization experiments, it is not sufficient to claim robust uptake. A quantitative measure is needed using the PO antibody showing what percentage of cells were positive for the ASO. Since the contention is that only Emc10 in excitatory neurons is important, it would be helpful if this also included a breakdown of ASO uptake in excitatory and inhibitory neurons and astrocytes.

      We thank the reviewer for highlighting the importance of quantifying ASO uptake and assessing cell-type specificity. To address this, we have added new data to the panel, as shown in the high-magnification images in Figure S14A. These images provide evidence that a large majority of NeuN-positive neurons exhibit a strong ASO signal. Specifically, we observed widespread ASO uptake (green) that extensively colocalized with the neuronal marker NeuN (red) in both the hippocampus and prefrontal cortex. Quantitative analysis of this overlap indicates that over 97% of NeuN-positive neurons were ASO-positive, demonstrating efficient neuronal uptake. This robust neuronal uptake aligns with the significant normalization of Emc10 levels and the behavioral improvements observed in ASO-treated Df(16)A<sup>+/-</sup> mice, further supporting the functional efficacy of our approach in modulating Emc10 expression within the relevant neuronal populations. Overall, the observed ASO uptake in neurons, as demonstrated by IHC, combined with RNA assays and the behavioral improvements in treated mice, strongly supports the efficacy of our approach in targeting Emc10 expression in the intended neuronal populations.

      (9) An interpretation is needed in Figure S3 as to why ~50% of the pathways increased are also present on the decreased list. Ie. G1/transition, viral reproductive process, pos regulator of cell stress, etc. 4/10 GO terms are present in both increased and decreased groups in A and 7/10 in B.

      We thank the reviewer for pointing out the overlap between pathways enriched in both the upregulated and downregulated miRNA groups in Figure S3. This overlap likely reflects the complex nature of miRNA regulation, where individual miRNAs can target multiple genes within a pathway, and single genes can be regulated by multiple miRNAs, sometimes with opposing effects (reviewed in Bartel, 2009; Bartel, 2018). For example, in the “G1/S transition” pathway, upregulated miRNAs such as miR-92a-3p, miR-92b-3p, and miR-34a-5p may promote the transition by targeting cell cycle regulators like FBXW7, CDKN1C, and CDK6 (Zhou et al., 2015; Zhao et al., 2021; Oda et al., 2024). Conversely, downregulated miRNAs such as miR-143-3p and miR-200b are known to suppress the transition by targeting genes such as HK2 and GATA-4 (Zhou et al., 2015; Yao et al., 2013). Our analysis identified overlapping predicted target genes for both upregulated and downregulated miRNAs, supporting the notion that many genes are subject to complex regulation by multiple miRNAs with potentially synergistic or antagonistic effects. Thus, the enrichment of certain GO terms in both groups likely reflects this intricate interplay of miRNA-mediated gene regulation. Future investigations focusing on specific miRNA-target interactions within these pathways will be critical to fully elucidate the underlying mechanisms and better understand the functional consequences of these opposing regulatory effects.

      Minor Concerns:

      (1) Define SM before using it.

      We have defined the SM assay in the main text upon its first mention, where we describe the assay and its relevance to cognitive function (see page 11 of the revised manuscript).

      (2) Statistics have been run in Figure S2, but not presented. The text only states that the differences between groups are significant. Please add in.

      We have revised the legend of Figure S2 to include the specific statistical test used (students t-tests) and the corresponding p-values.

      (3) The switch from ASO1 to ASO2 between Figures 5 and 6 needs more discussion. Why were new ASOs generated when ASO1 worked?

      We thank the reviewer for their question regarding the transition from Emc10<sup>ASO1</sup> to Emc10<sup>ASO2</sup> between Figure 4 and Figures 5-6. Emc10<sup>ASO1</sup> served as our initial proof-of-concept ASO construct, successfully demonstrating the feasibility of inhibiting Emc10 mRNA expression and providing evidence for behavioral rescue in our mouse model. As outlined in the manuscript, Emc10<sup>ASO2</sup> targets a different region of the Emc10 transcript (intron 1, Figure 5A) compared to Emc10<sup>ASO1</sup> (intron 2, Figure 4A). This distinction provides an additional layer of validation for our targeting strategy and ensures specificity in modulating Emc10 expression. In addition, Emc10<sup>ASO1</sup> exhibited limited distribution in the brain, primarily targeting the hippocampus with weaker inhibition of Emc10 in other regions such as the cortex (Figure 4C, right panel). Emc10<sup>ASO2</sup> overcame this limitation and achieve broader brain distribution, as demonstrated by the qRT-PCR data in Figure 5C. Given that 22q11.2DS can affect multiple brain regions and cognitive domains beyond the hippocampus, achieving broader distribution of the ASO is critical for a more comprehensive assessment of therapeutic potential.

      (4) Page 3: Define "LoF"

      We have defined Loss-of-Function (LoF) in the main text where it is first mentioned in the Introduction, where we discuss the potential of using LoF mutations to devise therapeutic interventions (see page 3 of the revised manuscript).

      References

      Bailey and Crawley, Anxiety-Related Behaviors in Mice, In: Methods of Behavior Analysis in Neuroscience. 2nd edition. Boca Raton (FL): CRC Press/Taylor & Francis; Chapter 5, (2009).

      Bartel, MicroRNAs: target recognition and regulatory functions, Cell 136(2):215-33, (2009).

      Bartel, Metazoan MicroRNAs, Cell, 173(1):20-51, (2018).

      Chitwood et al., EMC Is Required to Initiate Accurate Membrane Protein Topogenesis, Cell 175, 1507-1519 e1516, (2018).

      Chitwood and Hegde, The Role of EMC during Membrane Protein Biogenesis, Trends Cell Biol. (5):371-384, (2019).

      Darras et al., Nusinersen in later-onset spinal muscular atrophy: Long-term results from the phase 1/2 studies, Neurology 92(21), (2019).

      Diamantopoulou et al., Loss-of-function mutation in Mirta22/Emc10 rescues specific schizophrenia-related phenotypes in a mouse model of the 22q11.2 deletion, Proc Natl Acad Sci U S A 114, E6127-E6136, (2017).

      Donegan et al., Coding of social novelty in the hippocampal CA2 region and its disruption and rescue in a 22q11.2 microdeletion mouse model, Nat Neurosci 23, 1365-1375, (2020).

      Finkel et al., Nusinersen versus Sham Control in Infantile-Onset Spinal Muscular Atrophy, N Engl J Med 377(18):1723-1732, (2017).

      Kordasiewicz et al., Sustained therapeutic reversal of Huntington's disease by transient repression of huntingtin synthesis, Neuron 74(6):1031-44, (2012).

      Oda et al., MicroRNA-34a-5p: A pivotal therapeutic target in gallbladder cancer, Mol Ther Oncol, 32(1):200765, (2024).

      Piskorowski et al., Age-Dependent Specific Changes in Area CA2 of the Hippocampus and Social Memory Deficit in a Mouse Model of the 22q11.2 Deletion Syndrome. Neuron 89, 163-176, (2016).

      Qi et al., Combined small-molecule inhibition accelerates the derivation of functional cortical neurons from human pluripotent stem cells. Nat Biotechnol 35, 154-163, (2017).

      Scoles et al., Antisense oligonucleotide therapy for spinocerebellar ataxia type 2, Nature 44(7650):362-366, (2017).

      Shao et al., A recurrent, homozygous EMC10 frameshift variant is associated with a syndrome of developmental delay with variable seizures and dysmorphic features, Genet Med 23, 1158-1162, (2021).

      Shurtleff et al., The ER membrane protein complex interacts cotranslationally to enable biogenesis of multipass membrane proteins, Elife 7, (2018).

      Soutschek et al., A human-specific microRNA controls the timing of excitatory synaptogenesis, bioRxiv, (2023).

      Stark et al., Altered brain microRNA biogenesis contributes to phenotypic deficits in a 22q11-deletion mouse model. Nat Genet 40, 751-760, (2008).

      Xu et al., Derepression of a neuronal inhibitor due to miRNA dysregulation in a schizophrenia-related microdeletion, Cell 152, 262-275, (2013).

      Yao et al., miR-200b targets GATA-4 during cell growth and differentiation, RNA Biol.10(4):465-8, (2013).

      Zhao et al., miR-92b-3p Regulates Cell Cycle and Apoptosis by Targeting CDKN1C, Thereby Affecting the Sensitivity of Colorectal Cancer Cells to Chemotherapeutic Drugs, Cancers 2;13(13):3323, (2021).

      Zhou et al., miR-92a is upregulated in cervical cancer and promotes cell proliferation and invasion by targeting FBXW7, Biochem Biophys Res Commun 458(1):63-9, (2015).

      Zhou et al., MicroRNA-143 acts as a tumor suppressor by targeting hexokinase 2 in human prostate cancer, Am J Cancer Res. 5(6):2056-6 (2015).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors study the effects of synaptic activity on the process of eye-specific segregation, focusing on the role of caspase 3, classically associated with apoptosis. The method for synaptic silencing is elegant and requires intrauterine injection of a tetanus toxin light chain into the eye. The authors report that this silencing leads to increased caspase 3 in the contralateral eye (Figure 1) and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2. However, the quantifications showing increased caspase 3 in the silenced eye (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus. The authors also show that global caspase 3 deficiency impairs the process of eye-specific segregation and circuit refinement (Figures 3-4).

      The reviewer states: “this silencing leads to increased caspase 3 in the contralateral eye”. We observed increased caspase-3 activity, not protein levels, in the contralateral dLGN, not eye.

      The reviewer states: “and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2”. We do not believe that this statement is accurate, as we show that the punctate active caspase-3 signals overlap with the dendritic marker MAP2 (Figure S4A).

      The reviewer also states: “, the quantifications showing increased caspase 3 [activity] in the silenced [dLGN] (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus”. We do not believe that this statement is accurate. The apoptotic neurons we observed are relay neurons (confirmed by their morphology and positive staining of NeuN – Figure S4B-C) located in the dLGN (the dLGN is clearly labeled by expression of fluorescent proteins in RGCs, and only caspase-3 activity in the dLGN area is analyzed), not “cells” of unknown lineage (as suggested by the reviewer) in the general “thalamus” area (as suggested by the reviewer). If the dying cells were non-neuronal cells, that would indeed confound our quantification and conclusions, but that is not the case.

      We argue that whole-cell caspase-3 activation in dLGN relay neurons is a bona fide response to synaptic silencing by TeTxLC and therefore should be included in the quantification. We have two sets of controls: one is between the strongly inactivated dLGN and the weakly inactivated dLGN in the same TeTxLC-injected animal; and the second is between the dLGN of TeTxLC-injected animals and mock-injected animals. In both controls, only the dLGNs receiving strong synapse inactivation have more apoptotic dLGN relay neurons, demonstrating that these cells occur because of synapse inactivation. It is also unlikely that our perturbation is causing cell death through a non-synaptic mechanism. Since mock injections do not cause apoptosis in dLGN neurons, this phenomenon is not related to surgical damage. TeTxLC is injected into the eyes and only expressed in presynaptic RGCs, not in postsynaptic relay neurons, so this phenomenon is also unlikely to be caused by TeTxLC-related toxicity. Furthermore, if apoptosis of dLGN relay neurons is not related to synapse inactivation, then when TeTxLC is injected into both eyes, one would expect to see either the same amount or more apoptotic relay neurons, but we instead observed a reduction in dLGN neuron apoptosis, suggesting that synapse-related mechanisms are responsible. Considering the above, occasional whole-cell caspase-3 activation in relay neurons in TeTxLC-inactivated dLGN is causally linked to synapse inactivation and should be included in the quantification.

      We also revised the manuscript to better explain the possible mechanistic connection between localized caspase-3 activity and whole-cell caspase-3 activity. We propose that whole-cell caspase-3 activation occurs because of uncontrolled accumulation of localized caspase-3 activation. Please see line 127-140 and line 403-413 for details.

      Additionally, we would like to clarify that we are not claiming that synapse inactivation leads to only localized caspase-3 activation or only whole-cell caspase-3 activation, as is suggested by the editors and reviewers in the eLife assessment. We have clearly stated in the manuscript that both types of signals were observed. However, we reasoned that, because whole-cell caspase-3 activation in unperturbed dLGNs – which undergo normal synapse elimination – is infrequently observed, whole-cell caspase-3 activation may not be a significant driver of synapse elimination during normal development. In this revision, we included a new experiment to corroborate this hypothesis. If whole-cell caspase-3 activation in dLGN relay neurons is a prevalent phenomenon during normal development, such caspase-3 activity would lead to significant death of dLGN relay neurons during normal development. Consequently, if we block caspase-3 activation by deleting caspase-3, the number of relay neurons in the dLGN should increase. However, in support of our hypothesis, we observed comparable numbers of relay neurons in Casp3<sup>+/+</sup> and Casp3<sup>-/-</sup> mice. Please see Figure S7 for details.

      The authors also report that "synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia but not astrocytes" (abstract). They report that microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts. Based on this, the authors conclude that caspase 3 directs microglia to eliminate weaker synapses. However, a much simpler and critical experiment that the authors did not perform is to eliminate microglia and show that the caspase 3 dependent effects go away. Without this experiment, there is no reason to assume that microglia are directing synaptic elimination.

      The reviewer states: “microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts”. We are not sure what the reviewer means by “this preferentially occurs in silenced terminals”. Our results show that microglia preferentially engulf silenced terminals, and such preference is lost in caspase-3 deficient mice (Figure 6).

      We do not understand the experiment where the reviewer suggested to: “eliminate microglia and show that the caspase 3 dependent effects go away”. To quantify caspase-3 dependent engulfment of synaptic material by microglia or preferential engulfment of silenced terminals by microglia, microglia must be present in the tissue sample. If we eliminate microglia, neither of these measurements can be made. What could be measured if microglia are eliminated is the refinement of retinogeniculate pathway. This experiment would test whether microglia are required for caspase-3 dependent phenotypes. This is not a claim made in the manuscript. Instead, we claimed caspase-3 is required for microglia to engulf weak synapses, as supported by the evidence presented in Figure 6.

      We did not claim that “microglia are directing synaptic elimination”. Our claim is that synapse inactivation induces caspase-3 activity, and caspase-3 activation in turn leads to engulfment of weak synapses by microglia. Based on this model, it is the neuronal activity that fundamentally directs synapse elimination. Synapse engulfment by microglia is only a readout we used to measure the outcome of activity-dependent synapse elimination. We have revised all sections in the manuscript that are related to synapse engulfment by microglia to emphasize the logic of this model.

      We have also revised the abstract and title of the paper to better align it with our main claims, removed the reference to astrocytes, and clarified that microglia engulfment measurements are used as readouts of synapse elimination.

      Finally, the authors also report that caspase 3 deficiency alters synapse loss in 6-month-old female APP/PS1 mice, but this is not really related to the rest of the paper.

      We respectfully disagree that Figure 7 is not related to the rest of the paper. Many genes involved in postnatal synapse elimination, such as C1q and C3, have been implicated in neurodegeneration. It is therefore natural and important to ask whether the function of caspase-3 in regulating synaptic homeostasis extends to neurodegenerative diseases in adult animals. The answer to this question may have broad therapeutic impacts.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Yu et al. demonstrates that activation of caspase-3 is essential for synapse elimination by microglia, but not by astrocytes. This study also reveals that caspase 3 activation-mediated synapse elimination is required for retinogeniculate circuit refinement and eye-specific territories segregation in dLGN in an activity-dependent manner. Inhibition of synaptic activity increases caspase-3 activation and microglial phagocytosis, while caspase-3 deficiency blocks microglia-mediated synapse elimination and circuit refinement in the dLGN. The authors further demonstrate that caspase-3 activation mediates synapse loss in AD, loss of caspase-3 prevented synapse loss in AD mice. Overall, this study reveals that caspase-3 activation is an important mechanism underlying the selectivity of microglia-mediated synapse elimination during brain development and in neurodegenerative diseases.

      Strengths:

      A previous study (Gyorffy B. et al., PNSA 2018) has shown that caspase-3 signal correlates with C1q tagging of synapses (mostly using in vitro approaches), which suggests that caspase-3 would be an underlying mechanism of microglial selection of synapses for removal. The current study provides direct in vivo evidence demonstrating that caspase-3 activation is essential for microglial elimination of synapses in both brain development and neurodegeneration.

      The paper is well-organized and easy to read. The schematic drawings are helpful for understanding the experimental designs and purposes.

      Weaknesses:

      It seems that astrocytes contain large amounts of engulfed materials from ipsilateral and contralateral axon terminals (Figure S11B) and that caspase-3 deficiency also decreased the volume of engulfed materials by astrocytes (Figures S11C, D). So the possibility that astrocyte-mediated synapse elimination contributes to circuit refinement in dLGN cannot be excluded.

      We would like to clarify that we do not claim that astrocytes are unimportant for synapse elimination or circuit refinement. We acknowledge that the claim made in the original submitted manuscript that caspase-3 does not regulate synapse elimination by astrocytes lacks strong supporting evidence. We have removed this claim and revised the section related to synapse engulfment by astrocytes to provide a more rigorous interpretation of our data. We also removed the section in discussion regarding distinct substrate preferences of microglia and astrocytes.

      Does blocking single or dual inactivation of synapse activity (using TeTxLC) increase microglial or astrocytic engulfment of synaptic materials (of one or both sides) in dLGN?

      We assume that by “blocking single or dual inactivation of synapse activity”, the reviewer refers to inactivating retinogeniculate synapses from one or both eyes.

      We showed that inactivating retinogeniculate synapses from one eye (single inactivation) increases engulfment of inactive synapses by microglia (Figure 6). We did not measure synapse engulfment by microglia while inactivating retinogeniculate synapses from both eyes (dual inactivation). However, based on the total active caspase-3 signal (Figure 2) in the dual inactivation scenario, we do not expect to see an increase in engulfment of synaptic material by microglia.

      We did not measure astrocyte-mediated engulfment with single or dual inactivation, as we did not see a robust caspase-3 dependent phenotype in synapse engulfment by astrocytes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      (1) Figure 1 - It is not clear from this figure whether the authors are measuring caspase 3 in dendritic compartments or in dying relay neurons in the thalamus. The authors state that "either" whole cell death (1B) or smaller punctate signals (1F) were observed. When quantifying "photons" in Figure 1E, it appears most of the signal captured will be of dying relay neurons. What determined which signal was observed, and what is being quantified in Figure 1E? This also applies to the quantifications being reported in Figure 2.

      The quantification includes both types of signals – it is sum of all active caspase-3 signal within the dLGN boundary. We note that there is a significant amount of punctate signal in the TeTxLC-inactivated dLGN. Unfortunately, due to file compression, these signals are not clearly visible in the submitted manuscript file. We have provided high resolution figures in this revision.

      As argued above in the response to the public review, apoptotic relay neurons in TeTxLC-inactivated dLGN (not the general thalamus area) occur as a direct consequence of synapse inactivation. Therefore, active caspase-3 signals in these relay neurons should be included in the quantification.

      We believe it is the extent of synapse inactivation (i.e., the number of synapses that are inactivated) that determines whether dLGN relay neuron apoptosis occurs or not. Such apoptosis is expected considering the nature of the apoptosis signaling cascade. In the intrinsic apoptosis pathway, release of cytochrome-c from mitochondria induces cleavage of the initiator caspase, caspase-9, and caspase-9 in turn cleaves the executioner caspases, caspase-3/7, which causes apoptosis. Caspase-3 can cleave upstream factors in the apoptosis pathway, leading to explosive amplification of caspase-3 activity (McComb et al., DOI: 10.1126/sciadv.aau9433). When a relay neuron receives a few inactivated synapses, caspase-3 activation in the postsynaptic dendrite can remain local (as we observed in Figure 1), constrained by mechanisms such as proteasomal degradation of cleaved caspase-3 (Erturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014). However, when a relay neuron receives many inactivated synapses, the cumulative caspase-3 activity induced in the dendrite can overwhelm negative regulation and lead to significantly higher levels of caspase-3 activity in entire dendrites (Figure S4B) through positive feedback amplification, eventually leading to caspase-3 activation in entire relay neurons. Please see line 127-140 and line 403-413 for our discussion in the main text.

      (2) Figure 5 - Figures 5c-d and Fig 6 are confounded by pseudoreplication, whereby performing statistics on 50-60 microglia inflates statistical significance. Could the authors show all these data per mouse?

      If we understand the reviewer correctly, the reviewer is suggesting that reporting measurements from multiple microglia in one animal constitutes pseudo-replication. This is correct in a strict sense, as microglia in the same animal are more likely to be similar than microglia from different animals. In the revised version, we have plotted the data by animal in Figure S11 and S13. The observations remain valid. However, we would like to point out that averaging measurements from all microglia in each animal and report by mouse is very conservative, as measurements from microglia in the same animal still vary greatly due to cell-to-cell differences.

      (3) Although the authors are not the only ones to use this strategy, it is worth noting that performing all microglial experiments in Cx3cr1 heterozygotes could lead to alterations in microglial function that may not be reflective of their homeostatic roles.

      We acknowledge that Cx3cr1 heterozygosity could cause alterations in microglial physiology.

      While Cx3cr1 heterozygosity may impact microglia physiology, we note that the engulfment assay in Figure 5 is comparing microglia in Cx3cr1<sup>+/-</sup>; Casp3<sup>+/-</sup> and Cx3cr1<sup>+/-</sup>; Casp3<sup>-/-</sup> animals. Therefore, the impact of Cx3cr1 heterozygosity is controlled for in our experiment, and the observed difference in engulfed synaptic material in microglia is an effect specific to caspase-3 deficiency. However, we acknowledge that this difference could be quantitatively affected by Cx3cr1 heterozygosity.

      It is important to note that we did not perform all microglia engulfment analyses using Cx3cr1<sup>+/-</sup> mice. We have edited the manuscript to make this more clear. In the activity-dependent microglia engulfment analysis performed in Figure 6, we used Casp3<sup>+/+</sup> and Casp3<sup>-/-</sup> animals and detected microglia with anti-Iba1 immunostaining. Therefore, the impact of Cx3cr1 heterozygosity is not a problem for this experiment.

      Minor:

      (1) Figures are presented out of order, which makes the manuscript difficult to follow.

      We have revised text regarding the segregation analysis to align with the order of figures.

      (2) Figure S3 is very confusing- the terms "left" and "right" are used in three or four partly overlapping contexts (which eye, which injection, which panel or subpanel of the figure is being referred to). Would this not be more appropriately analyzed with a repeated measures ANOVA (multiple comparisons not necessary) rather than multiple separate T-tests?

      We have revised Figure S3 and S5 with better annotation and legends.

      Yes, it is possible to use repeated measure two-way ANOVA. The analysis reports significant effect from genotypes, with a dF of 1, SoS and MoS of 0.0001081, F(1,13) = 7.595, and p = 0.0164. We used multiple separate t-tests because we wanted to show how genotype effects change with increasing thresholds, whereas two-way ANOVA only provides one overall p-value.

      (3) Could the authors clarify why the percentage overlap (in the controls) is so different between Figure 3C and Figure S3C, and why different thresholds are applied?

      This difference is primary due to difference in age. Figure 3 and Figure S5 are acquired at age of P10, while Figure S3 is acquired at P8. While the segregation process is largely complete by P8, the segregation continues from P8 to P10. Therefore, overlap measured at P10 will be lower than that measured at P8. If we compare overlap at the same threshold (e.g., 10%) and at the same age in Figure 3 and S5, the overlap is very similar.

      The choice of threshold is related to the methods of labeling. In Figure 3, RGC terminals are labeled with AlexaFlour conjugated cholera toxin subunit-beta (CTB). In Figure S3 and S5, RGC axons are labeled by expression of fluorescent proteins. Labeling with CTB only labels membrane surfaces but yields stronger and slightly different signals at fine scales than labeling with fluorescent protein which are cell fillers. For Figure S3 and S5 (which use fluorescent protein labeling), higher thresholds such as those used in Figure 3 (which use CTB labeling) can be applied and the same trend still holds, but the data will be noisier. Regardless of the small difference in thresholds used, the important observation is that the defects in TeTxLC-injected or caspase-3 deficient animals are clear across multiple thresholds.

      (4) Many describe the eye-specific segregation process as being complete "between P8-10". Other studies have quantified ESS at P10 (Stevens 2007). The authors state they did all quantifications at P8 (l. 82) and refer to Figure 3, but Figure 3 shows images from P10, whereas Figure S3 shows data from P8.

      We did not say we performed all quantification at P8. In line 85, we said “To validate the efficacy of our synapse inactivation method, we injected AAV-hSyn-TeTxLC into the right eye of wildtype E15 embryos and analyzed the segregation of eye-specific territories at postnatal day 8 (P8), when the segregation process is largely complete”. The age of postnatal day 8 in this context is specifically referring to the experiment shown in Figure S3. For the segregation analysis in Figure 3, we specifically stated that the experiment was conducted at P10 (line 277).

      Although the experiment in Figure S3 is conducted at P8, and Figure S5 and Figure 3 show results at P10, each dataset always included appropriate age-matched controls.  P8 is generally considered an age where segregation is mostly complete and sufficient for us to assess the potency of TeTxLC-delivered AAV on eye segregation.  We don’t think performing the experiment shown in Figure S3 at P8 impacts the interpretation of the data.

      (5) Is Figure 6 also using Cx3cr1 GFP to label microglia? This is not clarified.

      We apologize for this oversight. In Figure 6 microglia are labeled by anti-Iba1 immunostaining. We have clarified this in figure legends and text.

      Reviewer #2 (Recommendations for the Authors):

      (1) The authors quantified the caspase-3 activity using immunostaining and confocal microscopy (Figures 1B-E). They may need to verify the result (increased level of activated caspase-3 upon synapse inactivation) using alternative methods, such as western blotting.

      Both western blot and immunostaining are based on antibody-antigen interaction. These two methods are not likely sufficiently independent. Additionally, to perform a western blot, we would need to surgically collect the TeTxLC-inactivated dLGN to avoid sample contamination from other brain regions. Such collection at the age we are interested in (P5) is very challenging. We have tested the anti-cleaved caspase-3 antibody using caspase-3 deficient mice and we can confirm it is a highly specific antibody that doesn’t generate signal in the caspase-3 deficient tissue samples.

      (2) Does caspase-3 deficiency alter the density of microglia or astrocytes in dLGN?

      No. Neither the density of microglia nor astrocytes changed with caspase-3 deficiency. In the case of microglia, we find that the mean density of microglia per unit area of dLGN is virtually the same in wild type and caspase-3 deficient mice (two-tailed t test P = 0.8556, 6 wild type and 5 Casp3<sup>-/-</sup> mice). Some overviews showing microglia in dLGNs of wildtype and caspase-3 deficient mice can be found in Figure S10.  Similarly for astrocytes, we did not observe overt changes in astrocytes dLGN density linked to caspase-3 deficiency.

      (3) During dLGN eye-specific segregation in normal developing animals, did the authors observe different levels of activated caspase-3 in different regions (territories)?

      For normal developing animals, the activated caspase-3 signal is generally sparse, and it is difficult to distinguish whether the signal is related to synapse elimination. For animals receiving TeTxLC-injection, we did notice that in the dLGN contralateral to the injection, where most inactivated synapses are located, the punctate caspase-3 signal tends to concentrate on the ventral-medial side of the dLGN (Figure 1B), which is the region preferentially innervated by the contralateral eye.

      (4) Recording of NMDAR-mediated synaptic currents may not be necessary for demonstrating that caspase 3 is essential for dLGN circuit refinement. In addition, the PPR may not necessarily reflect the number of innervations that a dLGN neuron receives. Instead, showing the changes in the frequency of mEPSCs (or synapse/spine density) may be more supportive.

      Thank you for the comment. We have performed the suggested mEPSC measurements and reported the results in revised Figure 4D-F.

      (5) Why is caspase 3 activation enhanced (compared to control) only at 4 months of age, when A-beta deposition has not formed yet, but not at later time points in AD mice (Figure S17)?

      A prevailing hypothesis in the field is that the form of A-beta that is most neurotoxic is the soluble oligomeric form, not the fibril form that leads to plaque deposition. As the oligomeric form appears before plaque deposition, the enhanced caspase-3 activation we observed at 4-month may reflect an increase in oligomeric A-beta, which occurs before any visible A-beta plaque formation.

      (6) The manuscript can be made more concise, and the figures more organized.

      We removed superfluous details and corrected text-figure mismatches in the revised manuscript to improve readability.

    1. Author response:

      We would like to express our gratitude to all three reviewers for their time and valuable feedback on the manuscript. Below, we provide our point-by-point responses to their comments. Additionally, we summarize here the experiments we plan to conduct in accordance with the reviewers' suggestions:

      Revision plan 1. To include live imaging of Dl/Notch trafficking in normal and GlcT mutant ISCs.

      We agree that the effect of GlcT mutation on Dl trafficking was not convincingly demonstrated in our previous work. Although we attempted live imaging of the intestine using GFP tagged at the C-terminal of Dl, the fluorescent signal was regrettably too weak for reliable capture. In this revision, we will optimize the imaging conditions to determine if this issue can be resolved. Alternatively, we will transiently express GFP/RFP-tagged Dl in both normal and mutant ISCs to investigate the trafficking dynamics through live imaging.

      Revision plan 2. To update and improve the presentation of the data regarding the features of early/late/recycling endosomes in GlcT mutant ISCs.

      Our analysis of Rab5 and Rab7 endosomes in both normal and GlcT mutant ISCs revealed that Dl tends to accumulate in Rab5 endosomes in GlcT mutant ISCs. To strengthen our findings, we will include additional quantitative data and conduct further analysis on recycling endosomes labeled with Rab11-GFP. We acknowledge that this portion of the data is not entirely convincing, and in accordance with the reviewers' suggestions, we will revise our conclusions to present a more tempered interpretation.

      Revision plan 3. To include western blot analysis of Dl in normal and GlcT mutant ISCs.

      While we propose that MacCer may function as a component of lipid rafts, facilitating the anchorage of Dl on the membrane and its proper endocytosis, it is also possible that it acts as a substrate for the modification of Dl, which is essential for its functionality. To investigate this further, we will conduct Western blot analysis to determine whether the depletion of GlcT alters the protein size of Dl.

      Please find our detailed point-by-point responses below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      From a forward genetic mosaic mutant screen using EMS, the authors identify mutations in glucosylceramide synthase (GlcT), a rate-limiting enzyme for glycosphingolipid (GSL) production, that result in EE tumors. Multiple genetic experiments strongly support the model that the mutant phenotype caused by GlcT loss is due to by failure of conversion of ceramide into glucosylceramide. Further genetic evidence suggests that Notch signaling is comprised in the ISC lineage and may affect the endocytosis of Delta. Loss of GlcT does not affect wing development or oogenesis, suggesting tissue-specific roles for GlcT. Finally, an increase in goblet cells in UGCG knockout mice, not previously reported, suggests a conserved role for GlcT in Notch signaling in intestinal cell lineage specification.

      Strengths:

      Overall, this is a well-written paper with multiple well-designed and executed genetic experiments that support a role for GlcT in Notch signaling in the fly and mammalian intestine. I do, however, have a few comments below.

      Weaknesses:

      (1) The authors bring up the intriguing idea that GlcT could be a way to link diet to cell fate choice. Unfortunately, there are no experiments to test this hypothesis.

      We indeed attempted to establish an assay to investigate the impact of various diets (such as high-fat, high-sugar, or high-protein diets) on the fate choice of ISCs. Subsequently, we intended to examine the potential involvement of GlcT in this process. However, we observed that the number or percentage of EEs varies significantly among individuals, even among flies with identical phenotypes subjected to the same nutritional regimen. We suspect that the proliferative status of ISCs and the turnover rate of EEs may significantly influence the number of EEs present in the intestinal epithelium, complicating the interpretation of our results. Consequently, we are unable to conduct this experiment at this time. The hypothesis suggesting that GlcT may link diet to cell fate choice remains an avenue for future experimental exploration.

      (2) Why do the authors think that UCCG knockout results in goblet cell excess and not in the other secretory cell types?

      This is indeed an interesting point. In the mouse intestine, it is well-documented that the knockout of Notch receptors or Delta-like ligands results in a classic phenotype characterized by goblet cell hyperplasia, with little impact on the other secretory cell types. This finding aligns very well with our experimental results, as we noted that the numbers of Paneth cells and enteroendocrine cells appear to be largely normal in UGCG knockout mice. By contrast, increases in other secretory cell types are typically observed under conditions of pharmacological inhibition of the Notch pathway.

      (3) The authors should cite other EMS mutagenesis screens done in the fly intestine.

      To our knowledge, the EMS screen on 2L chromosome conducted in Allison Bardin’s lab is the only one prior to this work, which leads to two publications (Perdigoto et al., 2011; Gervais, et al., 2019). We will include citations for both papers in the revised manuscript.

      (4) The absence of a phenotype using NRE-Gal4 is not convincing. This is because the delay in its expression could be after the requirement for the affected gene in the process being studied. In other words, sufficient knockdown of GlcT by RNA would not be achieved until after the relevant signaling between the EB and the ISC occurred. Dl-Gal4 is problematic as an ISC driver because Dl is expressed in the EEP.

      We agree that the lack of an observable phenotype using NRE-Gal4 might be attributed to a delay in its expression, which could result in missing the critical window necessary for effective GlcT knockdown. Consequently, we cannot rule out the possibility that GlcT may also play a role in early EBs or EEPs. We will revise our manuscript to present a more cautious conclusion on this issue.

      (5) The difference in Rab5 between control and GlcT-IR was not that significant. Furthermore, any changes could be secondary to increases in proliferation.

      We agree that it is possible that the observed increase in proliferation could influence the number of Rab5+ endosomes, and we will temper our conclusions on this aspect accordingly. However, it is important to note that, although the difference in Rab5+ endosomes between the control and GlcT-IR conditions appeared mild, it was statistically significant and reproducible. As we have indicated earlier, we plan to further analyze Rab11+ endosomes, as this additional analysis may provide further support for our previous conclusions.

      Reviewer #2 (Public review):

      Summary:

      This study genetically identifies two key enzymes involved in the biosynthesis of glycosphingolipids, GlcT and Egh, which act as tumor suppressors in the adult fly gut. Detailed genetic analysis indicates that a deficiency in Mactosyl-ceramide (Mac-Cer) is causing tumor formation. Analysis of a Notch transcriptional reporter further indicates that the lack of Mac-Ser is associated with reduced Notch activity in the gut, but not in other tissues.

      Addressing how a change in the lipid composition of the membranes might lead to defective Notch receptor activation, the authors studied the endocytic trafficking of Delta and claimed that internalized Delta appeared to accumulate faster into endosomes in the absence of Mac-Cer. Further analysis of Delta steady-state accumulation in fixed samples suggested a delay in the endosomal trafficking of Delta from Rab5+ to Rab7+ endosomes, which was interpreted to suggest that the inefficient, or delayed, recycling of Delta might cause a loss in Notch receptor activation.

      Finally, the histological analysis of mouse guts following the conditional knock-out of the GlcT gene suggested that Mac-Cer might also be important for proper Notch signaling activity in that context.

      Strengths:

      The genetic analysis is of high quality. The finding that a Mac-Cer deficiency results in reduced Notch activity in the fly gut is important and fully convincing.

      The mouse data, although preliminary, raised the possibility that the role of this specific lipid may be conserved across species.

      Weaknesses:

      This study is not, however, without caveats and several specific conclusions are not fully convincing.

      First, the conclusion that GlcT is specifically required in Intestinal Stem Cells (ISCs) is not fully convincing for technical reasons: NRE-Gal4 may be less active in GlcT mutant cells, and the knock-down of GlcT using Dl-Gal4ts may not be restricted to ISCs given the perdurance of Gal4 and of its downstream RNAi.

      As previously mentioned, we acknowledge that a role for GlcT in early EBs or EEPs cannot be completely ruled out. We will revise our manuscript to present a more cautious conclusion and explicitly describe this possibility in the updated version.

      Second, the results from the antibody uptake assays are not clear.: i) the levels of internalized Delta were not quantified in these experiments; ii) additionally, live guts were incubated with anti-Delta for 3hr. This long period of incubation indicated that the observed results may not necessarily reflect the dynamics of endocytosis of antibody-bound Delta, but might also inform about the distribution of intracellular Delta following the internalization of unbound anti-Delta. It would thus be interesting to examine the level of internalized Delta in experiments with shorter incubation time.

      We thank the reviewer for these excellent questions. In our antibody uptake experiments, we noted that Dl reached its peak accumulation after a 3-hour incubation period. We recognize that quantifying internalized Dl would enhance our analysis, and we will include the corresponding statistical graphs in the revised version of the manuscript. In addition, we agree that during the 3-hour incubation, the potential internalization of unbound anti-Dl cannot be ruled out, as it may influence the observed distribution of intracellular Dl. To address this concern, we plan to supplement our findings with live imaging experiments to capture the dynamics of Dl endocytosis in GlcT mutant ISCs.

      Overall, the proposed working model needs to be solidified as important questions remain open, including: is the endo-lysosomal system, i.e. steady-state distribution of endo-lysosomal markers, affected by the Mac-Cer deficiency? Is the trafficking of Notch also affected by the Mac-Cer deficiency? is the rate of Delta endocytosis also affected by the Mac-Cer deficiency? are the levels of cell-surface Delta reduced upon the loss of Mac-Cer?

      Regarding the impact on the endo-lysosomal system, this is indeed an important aspect to explore. While we did not conduct experiments specifically designed to evaluate the steady-state distribution of endo-lysosomal markers, our analyses utilizing Rab5-GFP overexpression and Rab7 staining did not indicate any significant differences in endosome distribution in MacCer deficient conditions. Moreover, we still observed high expression of the NRE-LacZ reporter specifically at the boundaries of clones in GlcT mutant cells (Fig. 4A), indicating that GlcT mutant EBs remain responsive to Dl produced by normal ISCs located right at the clone boundary. Therefore, we propose that MacCer deficiency may specifically affect Dl trafficking without impacting Notch trafficking.

      In our 3-hour antibody uptake experiments, we observed a notable decrease in cell-surface Dl, which was accompanied by an increase in intracellular accumulation. These findings collectively suggest that Dl may be unstable on the cell surface, leading to its accumulation in early endosomes.

      Third, while the mouse results are potentially interesting, they seem to be relatively preliminary, and future studies are needed to test whether the level of Notch receptor activation is reduced in this model.

      In the mouse small intestine, olfm4 is a well-established target gene of the Notch signaling pathway, and its staining provides a reliable indication of Notch pathway activation. While we attempted to evaluate Notch activation using additional markers, such as Hes1 and NICD, we encountered difficulties, as the corresponding antibody reagents did not perform well in our hands. Despite these challenges, we believe that our findings with Olfm4 provide an important start point for further investigation in the future.

      Reviewer #3 (Public review):

      Summary:

      In this paper, Tang et al report the discovery of a Glycoslyceramide synthase gene, GlcT, which they found in a genetic screen for mutations that generate tumorous growth of stem cells in the gut of Drosophila. The screen was expertly done using a classic mutagenesis/mosaic method. Their initial characterization of the GlcT alleles, which generate endocrine tumors much like mutations in the Notch signaling pathway, is also very nice. Tang et al checked other enzymes in the glycosylceramide pathway and found that the loss of one gene just downstream of GlcT (Egh) gives similar phenotypes to GlcT, whereas three genes further downstream do not replicate the phenotype. Remarkably, dietary supplementation with a predicted GlcT/Egh product, Lactosyl-ceramide, was able to substantially rescue the GlcT mutant phenotype. Based on the phenotypic similarity of the GlcT and Notch phenotypes, the authors show that activated Notch is epistatic to GlcT mutations, suppressing the endocrine tumor phenotype and that GlcT mutant clones have reduced Notch signaling activity. Up to this point, the results are all clear, interesting, and significant. Tang et al then go on to investigate how GlcT mutations might affect Notch signaling, and present results suggesting that GlcT mutation might impair the normal endocytic trafficking of Delta, the Notch ligand. These results (Fig X-XX), unfortunately, are less than convincing; either more conclusive data should be brought to support the Delta trafficking model, or the authors should limit their conclusions regarding how GlcT loss impairs Notch signaling. Given the results shown, it's clear that GlcT affects EE cell differentiation, but whether this is via directly altering Dl/N signaling is not so clear, and other mechanisms could be involved. Overall the paper is an interesting, novel study, but it lacks somewhat in providing mechanistic insight. With conscientious revisions, this could be addressed. We list below specific points that Tang et al should consider as they revise their paper.

      Strengths:

      The genetic screen is excellent.

      The basic characterization of GlcT phenotypes is excellent, as is the downstream pathway analysis.

      Weaknesses:

      (1) Lines 147-149, Figure 2E: here, the study would benefit from quantitations of the effects of loss of brn, B4GalNAcTA, and a4GT1, even though they appear negative.

      We will incorporate the quantifications for the effects of the loss of brn, B4GalNAcTA, and a4GT1 in the updated Figure 2.

      (2) In Figure 3, it would be useful to quantify the effects of LacCer on proliferation. The suppression result is very nice, but only effects on Pros+ cell numbers are shown.

      We will add quantifications of the number of EEs per clone to the updated Figure 3.

      (3) In Figure 4A/B we see less NRE-LacZ in GlcT mutant clones. Are the data points in Figure 4B per cell or per clone? Please note. Also, there are clearly a few NRE-LacZ+ cells in the mutant clone. How does this happen if GlcT is required for Dl/N signaling?

      In Figure 4B, the data points represent the fluorescence intensity per single cell within each clone. It is true that a few NRE-LacZ+ cells can still be observed within the mutant clone; however, this does not contradict our conclusion. As noted, high expression of the NRE-LacZ reporter was specifically observed around the clone boundaries in MacCer deficient cells (Fig. 4A), indicating that the mutant EBs can normally receive Dl signal from the normal ISCs located at the clone boundary and activate the Notch signaling pathway. Therefore, we believe that, although affecting Dl trafficking, MacCer deficiency does not significantly affect Notch trafficking.

      (4) Lines 222-225, Figure 5AB: The authors use the NRE-Gal4ts driver to show that GlcT depletion in EBs has no effect. However, this driver is not activated until well into the process of EB commitment, and RNAi's take several days to work, and so the author's conclusion is "specifically required in ISCs" and not at all in EBs may be erroneous.

      As previously mentioned, we acknowledge that a role for GlcT in early EBs or EEPs cannot be completely ruled out. We will revise our manuscript to present a more cautious conclusion and describe this possibility in the updated version.

      (5) Figure 5C-F: These results relating to Delta endocytosis are not convincing. The data in Fig 5C are not clear and not quantitated, and the data in Figure 5F are so widely scattered that it seems these co-localizations are difficult to measure. The authors should either remove these data, improve them, or soften the conclusions taken from them. Moreover, it is unclear how the experiments tracing Delta internalization (Fig 5C) could actually work. This is because for this method to work, the anti-Dl antibody would have to pass through the visceral muscle before binding Dl on the ISC cell surface. To my knowledge, antibody transcytosis is not a common phenomenon.

      We thank the reviewer for these insightful comments and suggestions. In our in vivo experiments, we observed increased co-localization of Rab5 and Dl in GlcT mutant ISCs, indicating that Dl trafficking is delayed at the transition to Rab7⁺ late endosomes, a finding that is further supported by our antibody uptake experiments. We acknowledge that the data presented in Fig. 5C are not fully quantified and that the co-localization data in Fig. 5F may appear somewhat scattered; therefore, we will include additional quantification and enhance the data presentation in the revised manuscript.

      Regarding the concern about antibody internalization, we appreciate this point. We currently do not know if the antibody reaches the cell surface of ISCs by passing through the visceral muscle or via other routes. Given that the experiment was conducted with fragmented gut, it is possible that the antibody may penetrate into the tissue through mechanisms independent of transcytosis.

      As mentioned earlier, we plan to supplement our findings with live imaging experiments to investigate the dynamics of Dl/Notch endocytosis in both normal and GlcT mutant ISCs. Anyway, due to technical challenges and potential pitfalls associated with the assays, we agree that this part of data is not fully convincing and we will provide a more cautious conclusion in the revised manuscript.

      (6) It is unclear whether MacCer regulates Dl-Notch signaling by modifying Dl directly or by influencing the general endocytic recycling pathway. The authors say they observe increased Dl accumulation in Rab5+ early endosomes but not in Rab7+ late endosomes upon GlcT depletion, suggesting that the recycling endosome pathway, which retrieves Dl back to the cell surface, may be impaired by GlcT loss. To test this, the authors could examine whether recycling endosomes (marked by Rab4 and Rab11) are disrupted in GlcT mutants. Rab11 has been shown to be essential for recycling endosome function in fly ISCs.

      We agree that assessing the state of recycling endosomes, especially by using markers such as Rab11, would be valuable in determining whether MacCer regulates Dl-Notch signaling by directly modifying Dl or by influencing the broader endocytic recycling pathway. We will incorporate these experiments into our future experimental plans to further characterize Dl trafficking in GlcT mutant ISCs.

      (7) It remains unclear whether Dl undergoes post-translational modification by MacCer in the fly gut. At a minimum, the authors should provide biochemical evidence (e.g., Western blot) to determine whether GlcT depletion alters the protein size of Dl.

      While we propose that MacCer may function as a component of lipid rafts, facilitating Dl membrane anchorage and endocytosis, we also acknowledge the possibility that MacCer could serve as a substrate for protein modifications of Dl necessary for its proper function. Conducting biochemical analyses to investigate potential post-translational modifications of Dl by MacCer would indeed provide valuable insights. To address this, we will incorporate Western blot analysis into our experimental plan to determine whether GlcT depletion affects the protein size of Dl.

      (8) It is unfortunate that GlcT doesn't affect Notch signaling in other organs on the fly. This brings into question the Delta trafficking model and the authors should note this. Also, the clonal marker in Figure 6C is not clear.

      In the revised working model, we will explicitly specify that the events occur in intestinal stem cells. Regarding Figure 6C, we will delineate the clone with a white dashed line to enhance its clarity and visual comprehension.

      (9) The authors state that loss of UGCG in the mouse small intestine results in a reduced ISC count. However, in Supplementary Figure C3, Ki67, a marker of ISC proliferation, is significantly increased in UGCG-CKO mice. This contradiction should be clarified. The authors might repeat this experiment using an alternative ISC marker, such as Lgr5.

      Previous studies have indicated that dysregulation of the Notch signaling pathway can result in a reduction in the number of ISCs. While we did not perform a direct quantification of ISC numbers in our experiments, our olfm4 staining—which serves as a reliable marker for ISCs—demonstrates a clear reduction in the number of positive cells in UGCG-CKO mice.

      The increased Ki67 signal we observed reflects enhanced proliferation in the transit-amplifying region, and it does not directly indicate an increase in ISC number. Therefore, in UGCG-CKO mice, we observe a decrease in the number of ISCs, while there is an increase in transit-amplifying (TA) cells (progenitor cells). This increase in TA cells is probably a secondary consequence of the loss of barrier function associated with the UGCG knockout.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors propose a transformer-based model for the prediction of condition - or tissue-specific alternative splicing and demonstrate its utility in the design of RNAs with desired splicing outcomes, which is a novel application. The model is compared to relevant existing approaches (Pangolin and SpliceAI) and the authors clearly demonstrate its advantage. Overall, a compelling method that is well thought out and evaluated.

      Strengths:

      (1) The model is well thought out: rather than modeling a cassette exon using a single generic deep learning model as has been done e.g. in SpliceAI and related work, the authors propose a modular architecture that focuses on different regions around a potential exon skipping event, which enables the model to learn representations that are specific to those regions. Because each component in the model focuses on a fixed length short sequence segment, the model can learn position-specific features. Another difference compared to Pangolin and SpliceAI which are focused on modeling individual splice junctions is the focus on modeling a complete alternative splicing event.

      (2) The model is evaluated in a rigorous way - it is compared to the most relevant state-of-the-art models, uses machine learning best practices, and an ablation study demonstrates the contribution of each component of the architecture.

      (3) Experimental work supports the computational predictions.    

      (4) The authors use their model for sequence design to optimize splicing outcomes, which is a novel application.

      We wholeheartedly thank Reviewer #1 for these positive comments regarding the modeling approach we took to this task and the evaluations we performed. We have put a lot of work and thought into this and it is gratifying to see the results of that work acknowledged like this.

      Weaknesses:

      No weaknesses were identified by this reviewer, but I have the following comments:

      (1) I would be curious to see evidence that the model is learning position-specific representations.

      This is an excellent suggestion to further assess what the model is learning. We have several ideas on how to test this which we will plan to report in the revised version. 

      (2) The transformer encoders in TrASPr model sequences with a rather limited sequence size of 200 bp; therefore, for long introns, the model will not have good coverage of the intronic sequence. This is not expected to be an issue for exons.

      Yes we can divide predictions by intron length, that’s a good suggestion. We will report on that in the revision.

      (3) In the context of sequence design, creating a desired tissue- or condition-specific effect would likely require disrupting or creating motifs for splicing regulatory proteins. In your experiments for neuronal-specific Daam1 exon 16, have you seen evidence for that? Most of the edits are close to splice junctions, but a few are further away.

      That is another good question and suggestion. In the original paper describing the mutation locations some motif similarities were noted to PTB (CU) and CUG/Mbnl-like elements (Barash et al Nature 2010). We could revisit this now with an RBP motif D.B. such as http://rbpdb.ccbr.utoronto.ca/. We note the ENCODE uses human cell lines and cannot be used for this but we will also look for mouse CLIP and KD data supporting such regulatory findings. 

      (4) For sequence design, of tissue- or condition-specific effect in neuronal-specific Daam1 exon 16 the upstream exonic splice junction had the most sequence edits. Is that a general observation? How about the relative importance of the four transformer regions in TrASPr prediction performance?

      This is another excellent question that we plan to follow up with matching analysis in the revision.

      (5) The idea of lightweight transformer models is compelling, and is widely applicable. It has been used elsewhere. One paper that came to mind in the protein realm:

      Singh, Rohit, et al. "Learning the language of antibody hypervariability." Proceedings of the National Academy of Sciences 122.1 (2025): e2418918121.

      Yes, we are for sure not the only/first to advocate for such an approach. We will be sure to make that point clear in the revision and thank the reviewer for the example from a different domain.  

      Reviewer #2 (Public review):

      Summary:

      The authors present a transformer-based model, TrASPr, for the task of tissue-specific splicing prediction (with experiments primarily focused on the case of cassette exon inclusion) as well as an optimization framework (BOS) for the task of designing RNA sequences for desired splicing outcomes.

      For the first task, the main methodological contribution is to train four transformer-based models on the 400bp regions surrounding each splice site, the rationale being that this is where most splicing regulatory information is. In contrast, previous work trained one model on a long genomic region. This new design should help the model capture more easily interactions between splice sites. It should also help in cases of very long introns, which are relatively common in the human genome.

      TrASPr's performance is evaluated in comparison to previous models (SpliceAI, Pangolin, and SpliceTransformer) on numerous tasks including splicing predictions on GTEx tissues, ENCODE cell lines, RBP KD data, and mutagenesis data. The scope of these evaluations is ambitious; however, significant details on most of the analyses are missing, making it difficult to evaluate the strength of the evidence. Additionally, state-of-the-art models (SpliceAI and Pangolin) are reported to perform extremely poorly in some tasks, which is surprising in light of previous reports of their overall good prediction accuracy; the reasoning for this lack of performance compared to TrASPr is not explored.

      In the second task, the authors combine Latent Space Bayesian Optimization (LSBO) with a Transformer-based variational autoencoder to optimize RNA sequences for a given splicing-related objective function. This method (BOS) appears to be a novel application of LSBO, with promising results on several computational evaluations and the potential to be impactful on sequence design for both splicing-related objectives and other tasks.

      We thank Reviewer #2 for this detailed summary and positive view of our work. It seems the main issue raised in this summary regards the evaluations: The reviewer finds details of the evaluations missing and the fact that SpliceAI and Pangolin perform poorly on some of the tasks to be surprising. In general, we made a concise effort to include the required details, including code and data tables, but will be sure to include more details based on the specific questions/comments listed below. As for the perceived performance issues for Pangolin/SpliceAI we believe this may be the result of not making it clear what tasks they perform well on vs those in which they do not work well. We give more details below. 

      Strengths:

      (1) A novel machine learning model for an important problem in RNA biology with excellent prediction accuracy.

      (2) Instead of being based on a generic design as in previous work, the proposed model incorporates biological domain knowledge (that regulatory information is concentrated around splice sites). This way of using inductive bias can be important to future work on other sequence-based prediction tasks.

      Weaknesses:

      (1) Most of the analyses presented in the manuscript are described in broad strokes and are often confusing. As a result, it is difficult to assess the significance of the contribution.

      We made an effort to make the tasks be specific and detailed,  including making the code and data of those available. Still, it is evident from the above comment Reviewer #2 found this to be lacking. We will review the description and make an effort to improve that given the clarifications we include below. 

      (2) As more and more models are being proposed for splicing prediction (SpliceAI, Pangolin, SpliceTransformer, TrASPr), there is a need for establishing standard benchmarks, similar to those in computer vision (ImageNet). Without such benchmarks, it is exceedingly difficult to compare models. For instance, Pangolin was apparently trained on a different dataset (Cardoso-Moreira et al. 2019), and using a different processing pipeline (based on SpliSER) than the ones used in this submission. As a result, the inferior performance of Pangolin reported here could potentially be due to subtle distribution shifts. The authors should add a discussion of the differences in the training set, and whether they affect your comparisons (e.g., in Figure 2). They should also consider adding a table summarizing the various datasets used in their previous work for training and testing. Publishing their training and testing datasets in an easy-to-use format would be a fantastic contribution to the community, establishing a common benchmark to be used by others.

      There are several good points to unpack here. First, we agree that a standard benchmark will be useful to include. We will work to create and include one for the revision. That said, we note that unlike the example given by Reviewer #2 (ImageNet) there are no standards for the splicing prediction tasks. There are actually different task definitions with different input/outputs as we tried to cover briefly in the introduction section. 

      Second, regarding the usage of different data and distribution shifts as potential reasons for Pangolin performance differences. We originally evaluated Pangolin after retraining it with MAJIQ based quantifications and found no significant changes. We will include a more detailed analysis of Pangolin retrained like this in the revision. We also note that Pangolin original training involved significantly more data as it was trained on four species with four tissues each, and we only evaluated it on three of those tissues (for human), in exons the authors deemed as test data. That said, we very much agree that retraining Pangolin as mentioned above is warranted, as well as clearly listing what data was used for training as suggested by the reviewer.

      (3) Related to the previous point, as discussed in the manuscript, SpliceAI, and Pangolin are not designed to predict PSI of cassette exons. Instead, they assign a "splice site probability" to each nucleotide. Converting this to a PSI prediction is not obvious, and the method chosen by the authors (averaging the two probabilities (?)) is likely not optimal. It would interesting to see what happens if an MLP is used on top of the four predictions (or the outputs of the top layers) from SpliceAI/Pangolin. This could also indicate where the improvement in TrASPr comes from: is it because TrASPr combines information from all four splice sites? Also, consider fine-tuning Pangolin on cassette exons only (as you do for your model).

      As mentioned above, we originally did try to retrain Pangolin with MAJIQ PSI values without observing much differences, but we will repeat this and include the results in the revision. Trying to combine 4 different SpliceAI models as proposed by the Reviewer seems to be a different kind of a new model, one that takes 4 large ResNets and combines those with annotation. Related to that, we did try to replace the transformers in our ablation study. The reviewer’s suggestion seems like another interesting architecture to try but since this is a non existing model that would likely require some adjustments. Given that, we view adding such a new model architecture as beyond the scope of this work.

      (4) L141, "TrASPr can handle cassette exons spanning a wide range of window sizes from 181 to 329,227 bases - thanks to its multi-transformer architecture." This is reported to be one of the primary advantages compared to existing models. Additional analysis should be included on how TrASPr performs across varying exon and intron sizes, with comparison to SpliceAI, etc.

      Yes, that is a good suggestion, similar to one made by Reviewer #1 as well. We plan to include such analysis in the revision. 

      (5) L171, "training it on cassette exons". This seems like an important point: previous models were trained mostly on constitutive exons, whereas here the model is trained specifically on cassette exons. This should be discussed in more detail.

      Previous models were not trained exclusively on constitutive exons and Pangolin specifically was trained with their version of junction usage across tissues. That said, the reviewer’s point is valid (and similar to ones made above) about a need to have a matched training/testing. As noted above we plan to include Pangolin training on our PSI values for comparison.

      (6) L214, ablations of individual features are missing.

      OK

      (7) L230, "ENCODE cell lines", it is not clear why other tissues from GTEx were not included.

      The task here was to assess predictions in very different conditions, hence we tested on completely different data of human cell lines rather than similar tissue samples. Yes, we can also assess on unseen GTEX tissues as well.

      (8) L239, it is surprising that SpliceAI performs so badly, and might suggest a mistake in the analysis. Additional analysis and possible explanations should be provided to support these claims. Similarly, the complete failure of SpliceAI and Pangolin is shown in Figure 4d.

      Line 239 refers to predicting relative inclusion levels between competing 3’ and 5’ splice sites. We admit we too expected this to be better for SpliceAI and Pangolin and will be sure to recheck for bugs, but to be fair we are not aware of a similar assessment being done for either of those algorithms (i.e. relative inclusion for 3’ and 5’ alternative splice site events).

      One issue we ran into, reflected in Reviewer #2 comments, is the mix between tasks that SpliceAI and Pangolin excel at and other tasks where they should not necessarily be expected to excel. Both algorithms focus on cryptic splice site creation/disruption. This has been the focus of those papers and subsequent applications.  While Pangolin added tissue specificity to SpliceAI training, the authors themselves admit “...predicting differential splicing across tissues from sequence alone is possible but remains a considerable challenge and requires further investigation”. The actual performance on this task is not included in Pangolin’s main text, but we refer Reviewer #2 to supplementary figure S4 in that manuscript to get a sense of Pangolin’s reported performance on this task. Similar to that, Figure 4d is for predicting *tissue specific* regulators. We do not think it is surprising that SpliceAI (tissue agnostic) and Pangolin (slight improvement compared to SpliceAI in tissue specific predictions) do not perform well on this task.  Similarly, we do not find the results in Figure 4C surprising either. These are for mutations that slightly alter inclusion level of an exon, not something SpliceAI was trained on, as it was simply trained on splice sites yes/no predictions. As noted and we will stress in the revision as well, training Pangolin on this dataset like TrASPr gives similar performance. That is to be expected as well - Pangolin is constructed to capture changes in PSI, those changes are not even tissue specific for CD19 data and the model has no problem/lack of capacity to generalize from the training set just like TrASPr does. In fact, if you only use combination of known mutations seen during training a simple regression model gives correlation of ~92-95% (Cortés-López et al 2022). In summary, we believe that better understanding of what one can realistically expect from models such as SpliceAI, Pangolin, and TrASPr will go a long way to have them better understood and used effectively. We will try to improve on that in the revision.

      (9) BOS seems like a separate contribution that belongs in a separate publication. Instead, consider providing more details on TrASPr.

      We thank the reviewer for the suggestion. We agree those are two distinct contributions and we indeed considered having them as two separate papers. However, there is strong coupling between the design algorithm (BOS) and the predictor that enables it (TrASPr). This coupling is both conceptual (TrASPr as a “teacher”) and practical in terms of evaluations. While we use experimental data (experiments done involving Daam1 exon 16, CD19 exon 2) we still rely heavily on evaluations by TrASPr itself. A completely independent evaluation would have required a high-throughput experimental system to assess designs, which is beyond the scope of the current paper. For those reasons we eventually decided to make it into what we hope is a more compelling combined story about generative models for prediction and design of RNA splicing. 

      (10) The authors should consider evaluating BOS using Pangolin or SpliceTransformer as the oracle, in order to measure the contribution to the sequence generation task provided by BOS vs TrASPr.

      We can definitely see the logic behind trying BOS with different predictors. That said, as we note above most of BOS evaluations are based on the “teacher”. As such, it is unclear what value replacing the teacher would bring. We also note that given this limitation we focus mostly on evaluations in comparison to existing approaches (genetic algorithm or random mutations as a strawman).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      Fleming et al. present the first, proteomics-based attempt to identify the possible mechanism of action of ALS-linked DNAJC7 molecular chaperone in pathology. Impressively, it is the first report of DNAJC7 interactome studies, using a suitable iPSC-derived lower motor neuron model. Using a co-immunoprecipitation approach the authors identified that the interactome of DNAJC7 is predominantly composed of proteins engaged in response to stress, but also that this interactome is enriched in RNA-binding proteins. The authors also created a DNAJC7 haploinsufficiency cellular model and show the resulting increased insolubility of HNRNPU protein which causes disruptions in its functionality as shown by analysis of its transcriptional targets. Finally, this study uses pharmacological agents to test the effect of decreased DNAJC7 expression on cell response to proteotoxic stress and finds evidence that DNAJC7 regulates the activation of Heat shock factor 1 (HSF1) protein upon stress conditions.

      Strengths

      (1) This study uses the best so far model to study the interactome and possible mechanism of action of DNAJC7 molecular chaperone in an iPSC-derived cellular model of motor neurons. Furthermore, the authors also looked into available transcriptome databases of ALS patient samples to further test whether their findings may yield relevance to pathology.

      (2) The extent to which the authors are explicit about the sample sizes, protocols, and statistical tests used throughout this manuscript, should be applauded. This will help the whole field in their efforts to reliably replicate the results in this study.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses

      (1) The most significant caveat of interactome experiments inherently comes from the method of choice. It is possible that by using the co-purification approach of DNAJC7 IP the resulting pool of binding partners is depleted in proteins that interact with DNAJC7 weakly or transiently. An alternative approach presumably more sensitive towards weaker binders could use the TurboID-based proximity-labeling method.

      The reviewer raises a valid point that TurboID-based proximity biotinylation could be a more sensitive approach for identifying DNAJC7 protein-protein interactions compared to IP-MS. We agree that this strategy could be better suited to detect weak or transient interactions, and we have previously used it to characterize protein nanoenvironments and interactomes in vitro and in vivo (Wang et al. Mol Psychiatry 2024, Quan et al. mBio 2024). However, proximity biotinylation also has significant limitations, such as potential artifacts due to overexpression and high background levels. We selected the IP-MS approach to identify DNAJC7 binding partners in neurons without the need of genetically modifying or over-expressing DNAJC7.

      (2) The authors mention in Results (and Figure 2D) that HNRNPA1 was identified as DNAJC7-interacting protein in their co-IP experiments, however, an identifier for this protein cannot be found in Figure 1C and Table S1 listing the proteomics results. Could the authors appropriately update Figure 1C and Table S1, or if HNRNPA1 wasn't really a hit then remove it from listed HNRNPs?

      We apologize for the confusion. HNRNPA1 was pulled down exclusively with DNAJC7 in 2/3 independent experiments and was initially included in our list of targets. However, in our final and most stringent analysis we only considered proteins that appeared in 3/3 experiments and thus HNRNPA1 was filtered out of Figure 1C and Table S1. We will therefore remove it from Figure 2D in the revised manuscript.

      (3) No further validation of DNAJC7-interacting proteins from the heat-shock protein (HSP) family. Current validation of mass spectrometry-identified proteins comes from IP-western blots with antibodies against HSPs. It would be interesting to further inspect possible interactions of these proteins by inspecting co-localization with immunocytochemistry.

      As the reviewer points out we did in fact validate the interaction of DNAJC7 with HSP90 and HSP70 (HSP90AB1 and HSPA1A) by IP-WB as shown in Fig 1F. We agree that examining co-localization of these proteins by immunocytochemistry (ICC) would be important to investigate. However, we have been unable to do this due to technical limitations. Specifically, we have tried to perform ICC using 6 commercially available DNAJC7 antibodies and have so far been unsuccessful. In our hands the DNAJC7 ICC signal appears to be non-specific as it is not reduced when using DNAJC7 knockout and knockdown cells as controls.

      (4) Similarly, the observation of DNAJC7 haploinsufficiency causing an increase in HNRNPU insolubility could be also easily further confirmed by checking for the emergence of "puncta" under a fluorescence microscope, in addition to provided WB experiments from MN lysates.

      This is a good suggestion, and we can assess the emergence of HNRNPU "puncta" by ICC in DNAJC7 mutant iPSC-derived neurons and/or postmortem sporadic ALS patient tissue.

      (5) I would like to recommend the authors to also provide with this manuscript a complete dataset (possibly in the form of a table, presented similarly as Table S1) resulting from experiments presented in Figures 2F and S2D. The information on upregulated and downregulated targets in their DNAJC7 haploinsufficiency model would be a valuable resource for the field and enable further investigations.

      This is a good suggestion and in the revised version we will provide in Table S2 the dataset presented in Figs. 2F and S2D.

      Reviewer #2 (Public review):

      Summary:

      The manuscript titled "The ALS-associated co-chaperone DNAJC7 mediates neuroprotection against proteotoxic stress by modulating HSF1 activity" describes experiments carried out in iPS cells re-differentiated into motor neurons (iNeuons, MNs) seeking to assess the functions of the J protein DnaJC7 in proteostasis. This study also investigates how an ALS-associated mutant variant (R156X) alters DnaJC7 function. The proteomic studies identify proteins interacting with DnaJC7. Using mRNA profiling in haplo-insufficient cells (+/R156X) compared to wild-type cells, the study seeks to identify pathways modulated by partial loss of DnaJC7 function. Studies in the DnaJC7 haplo-insufficient cells also indicate changes in the properties of ALS-associated proteins, such as HNRNPU and Matrin3 both of which are involved in the regulation of gene expression. The study also shows data indicating that DnaJC7 haploinsufficiency sensitizes cells to proteostatic stress induced by proteosome inhibition by MG132 and Hsp90 inhibition by Ganetespib. Lastly, the study investigates how DnaJC7 modulates the activity of the heat shock transcription factor (Hsf1) and thus the heat shock response.

      Strengths<br /> (1) The manuscript is well presented and most of the data is of high quality and convincing. The figures and supplementary figures are clear and easy to follow.

      (2) This study overall provides important new insights into a mostly underexplored molecular co-chaperone and its role in proteostasis. The proteomic and transcriptomic experiments certainly advance our understanding of DnaJC7. The MN model is well-suited for these studies addressing the role of DnaJC7, particularly regarding ALS. The haplo-insufficient MNs are also a suitable model to study a potential loss of function mechanism caused by (some) fALS-associated mutants in ALS, such as the R156X mutation used here.

      (3) Since so little is known about DnaJC7 function, the exploratory approaches applied here are particularly useful.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses

      (1) Without follow-up studies, however, e.g., with select interacting proteins, the study provides merely a descriptive list of possible interactions without mechanistic insights. Also, most interactions have not been extensively (only a few examples) validated by other methods or individual experiments.

      We appreciate the reviewers concern and agree that there are several intriguing DNAJC7 interactors worth studying further, that is why we wanted to share this resource with the broader community as quickly as possible. As the first study focused on DNAJC7 and its link to ALS we could not possibly investigate multiple potential interactors and focused on two: HNRNPU and HSP70/HSP90, associated with RNA metabolism and stress response respectively, as these are two pathways have previously been implicated in ALS pathogenesis. We do provide validation of these interactions and some mechanistic insight into how DNAJC7 haploinsufficiency impairs their function.

      A major limitation of the study in its current form is that none of the experimental approaches allow for assessing the specific functions of JC7. In the absence of specificity controls, e.g., other J proteins or HOP, which, like DnaJC7, contains TPR domains and can interact with Hsp70 and Hsp90, it remains unclear if the proposed functions of DnaJC7 are specific/unique or shared by other J proteins or molecular chaperones. Accordingly, it would be highly informative to add experiments to assess if some of the reported DnaJC7 protein-protein interactions and the transcriptional alterations in haplo-insufficient cells are DnaJC7specific or also occur with other J proteins or molecular chaperones. This seems particularly important to discern specific DnaJC7 functions from general effects caused by impaired proteostasis.

      We agree with the reviewer that is a very interesting question, as for example mutations in DNAJC6 can cause rare forms of Parkinson’s Disease1. However, addressing the functional overlap of DNAJC7 with other J proteins such as DNAJC6 would require substantial time and resources and is out of scope of the current manuscript. 

      It would be informative to explore how cellular stress (e.g., MG132 treatment) alters DnaJC7 interactions with other proteins (J proteins, HOP), ideally in additional/comparative proteomic studies. The mechanism underlying the proposed regulation of Hsf1 by DnaJC7 is not quite clear to me (Figures 4 A-I). There is no evidence of a direct physical interaction between DnJC7 and Hsf1 in the proteomic data or elsewhere. It seems plausible that Hsf1/HSR dysregulation in the haplo-insufficient cells might be due to rather indirect effects, e.g., increased protein misfolding. Also, additional data showing differential activation of Hsf1 in +/+ versus +/- cells would strengthen this part, e.g. showing differences in Hsf1 trimerization, Hsp70 interactions, nuclear localization, etc.

      The reviewer makes two good points here. Firstly, we do agree we should provide additional data to better understand the differential activation of HSF1 in DNACJ7 heterozygous neurons and we will focus on this question during the revision. We also agree that the mechanism underlying the regulation of HSF1 by DNAJC7 is not well defined and we acknowledge it could be indirect. Of note, HSF1 activation is regulated by HSP70, of which DNAJC7 is a co-chaperone. We will attempt to define this mechanism better during the revision.

      The manuscript might also benefit from considering the literature showing an unusually inactive HSR and Hsf1 activity in motor neurons (e.g. published by the Durham lab).

      Yes—we did in fact note this in our discussion: “At the same time, mouse MNs have previously been shown to maintain a high threshold of induction of the HSF1-mediated stress response relative to other cell types including glial cells, with the suggestion that this contributes to their vulnerability to stress signals such as insoluble proteins.” We will further consider how our findings are in line with those of Durham et al., in the revised discussion.

      The correlation with transcriptomic data from ALS patients compared to neurotypical controls (Figures 4 L, M) suggesting a direct role of Hsf1/HSR seems unlikely at this point. In my view, the transcriptional dysregulation in ALS patients could be unrelated to Hsf1 dysregulation and caused by rather non-specific effects of neuronal decay in ALS.

      This is a very reasonable concern.  We acknowledge that the HSF1 effects in patients could be driven by multiple other factors including C9-DPRs etc. However, the point of this analysis is not to claim that DNAJC7 is the cause; but rather to highlight the importance of the HSF1 pathway, which we identified as being mis-regulated in DNAJC7 mutant neurons, as broadly relevant in sporadic and other forms of genetic ALS. 

      Reviewer #3 (Public review):

      Summary:

      Fleming et al sought to better understand DNAJC7's function in motor neurons as mutations in this gene have been associated with amyotrophic lateral sclerosis (ALS). The research question is relevant and important. The authors use an induced pluripotent stem cell (iPSC) line to derive motor neurons (iMNs) finding that DNAJC7 interacts with RNA-binding proteins (RBP) in wild-type cells and a truncated mutant DNAJC7[R156*] disrupts the RBP, hnRNPU, by promoting its accumulation into insoluble fractions. Given that DNAJC7 is predicted to regulate stress responses, the authors then find that DNAJC7[R156*] expression sensitizes the iMNs to proteosomal stress by disrupting the expression of the key heat stress response regulator, HSF1. These findings support that loss-of-function mutations in DNAJC7 will indeed sensitize motor neurons to proteotoxic stress, potentially driving ALS. The association with RBPs, which routinely are found to be disrupted in ALS, is of interest and warrants further study.

      Strengths

      (1) The research question is relevant and important. The authors provide interesting data that DNAJC7 mutations impact two important features in ALS, the dysregulation of RNA binding proteins and the sensitivity of motor neurons to proteotoxic stress.

      (2) The authors provide solid data to support their findings and the assays are appropriate.

      We thank the reviewer for highlighting the strengths of our study.

      Weaknesses

      (1) The authors rely on a single iPSC line throughout the text, using the same line to make the mutation-carrying cells. iPSCs are highly variable and at minimum 3 lines, typically 5 lines, should be used to define consistent findings. This work would be greatly strengthened if 3 or more lines were used to confirm consistent effects. This is particularly concerning given that iPSCs were differentiated using growth factors versus genetic induction. Growth-factor-based differentiations are more variable.

      We will substantiate the major findings by the use of additional models and genetic backgrounds during the revision. However, our experiments utilize isogenic controls and extensive quality control assays (on-target, off target analysis, whole genome sequencing, karyotype etc.) to ensure that our isogenic lines are genomically identical --other than the DNAJC7 mutation-- and thus any phenotypes are likely caused by mutant DNAJC7 itself.   

      (2) The authors argue that HSF1 and its targets are downregulated in sporadic ALS and mutant C9orf72 ALS. The first concern is that these transcriptomics data were derived from cortical tissue which does not contain motor neurons (Pineda et al. 2024 Cell 187: 1971-1989.e1916). The second concern is that the inclusion of C9orf72 mutant tissue is not well justified as (1) this mutation is associated with an upregulation of HSF1 and its targets in patients (Mordes et al, Acta Neuropathol Commun 2018 6(1):55; Lee et al Neuron 2023 111(9):1381-1390) and (2) the C9orf72 mutation is associated with a ALS/FTD spectrum disorder defined by TDP-43 pathology. Disease mechanisms associated with this spectrum disorder may not overlap with traditional ALS which is typically defined by SOD1 pathology.

      SOD1 pathology represents only a small fraction (<2%) of all ALS patients and is therefore not traditional ALS. The majority (<97%) of sporadic and familial ALS cases (including C9orf72 but excluding SOD1 and FUS cases) are uniformly characterized by TDP-43 pathology. Nevertheless, we do agree that it would be better to assess spinal cord data but unfortunately such single cell datasets form ALS patients do not currently exist. We acknowledge that the HSF1 effects in patients could be driven by multiple other factors including C9-DPRs etc. However, the point of this analysis is not to claim that DNAJC7 is the cause; but rather to highlight the importance of the HSF1 pathway, which we identified as being mis-regulated in DNAJC7 mutant neuron, as being broadly relevant in sporadic and other forms of genetic ALS. 

      (3) As a whole, the findings are mechanistically disjointed, and additional experiments or discussion would help to connect the dots a bit more.

      We will revise the manuscript with additional experiments and discussion to better connect the dots.

      Citations

      (1) Kurian, M. A. & Abela, L. in GeneReviews(®)   (eds M. P. Adam et al.)  (University of Washington, Seattle Copyright © 1993-2025, University of Washington, Seattle. GeneReviews is a registered trademark of the University of Washington, Seattle. All rights reserved., 1993).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) While the study demonstrates that ZSS has protective effects across a wide range of animal models, including AD, FTD, DLB, PD, and both young and aged mice, it is broad and lacks a detailed investigation into the underlying mechanisms. This is the most significant concern.

      We appreciate this comment. We recognize that elucidating the mechanism is an important research topic, and we are currently working on it. The purpose of publishing this paper at this time is to inform the public as soon as possible about natural materials and methods that may be effective in preventing dementia and neurodegenerative diseases, and to encourage similar research.

      (2) The authors highlight that the non-extracted simple crush powder of ZSS shows more substantial effects than its hot water extract and extraction residue. However, the manuscript provides very limited data comparing the effects of these three extracts.

      Certainly, it would be better to compare them in several different models, but we believe that important results have already been obtained in tau Tg mice, and comparative data in other models are just additive and confirmatory.

      (3) The authors have not provided a rationale for the dosing concentrations used, nor have they tested the effects of the treatment in normal mice to verify its impact under physiological conditions.

      As described in the Materials and Methods section, the dosage was determined based on the results of preliminary experiments. The beneficial effects in normal mice are shown in Figure 5.

      (4) Regarding the assessment of cognitive function in mice, the authors only utilized the Morris Water Maze (MWM) test, which includes a five-day spatial learning training phase followed by a probe trial. The authors focused solely on the learning phase. However, it is relevant to note that data from the learning phase primarily reflects the learning ability of the mice, while the probe trial is more indicative of memory. Therefore, it is essential that probe trial data be included for a more comprehensive analysis. A justification should be included to explain why the latency of 1st is about 50s not 60s.

      We agree that it is better to include the results of the probe test. We did not include them this time, but we would like to include them in the future. In the memory acquisition training, five trials were performed per day. Since the mice learned the location of the platform during the first five trials, the latency on the first day became around 50 seconds.

      (5) The BDNF immunohistochemical staining in the manuscript appears to be non-specific.

      We cannot understand the basis for saying it is non-specific.

      (6) The central pathological regions in PD are the substantia nigra and striatum. Please replace the staining results from the cortex and hippocampus with those from these regions in the PD model.

      We examined the substantia nigra and found that synuclein pathology appeared in Tg mice and was suppressed by ZSS administration. However, because we did not investigate the striatum, we decided not to show the results for the nigrostriatal system this time. Instead, we thought that we could demonstrate the inhibitory effect of ZSS on synuclein pathology by showing the results for the cortex and hippocampus, which showed early functional decline in these mice (Fig. 4E).

      Reviewer #2 (Public review):

      The authors' study lacked an in-depth exploration of mechanisms, including changes in intracellular signal transduction, drug targets, and drug toxicity detection.

      We appreciate this comment. We understand that the mechanism, targets, and toxicity are important issues to be considered in the future.

      Reviewer #3 (Public review):

      However, this work did not include a mechanistic study or target data on ZSS were included, and PK data were also not involved. Mechanisms or targets and PK study are suggested. A human PK study is preferred over mice or rats. E.g. which main active ingredients and the concentration in plasma, in this context, to study the pharmacological mechanisms of ZSS.

      We appreciate this comment. We understand that the mechanism and target are important issues to consider in the future. As the reviewer pointed out, to conduct PK studies, we must first identify the active ingredients. Unfortunately, we have not been able to identify them yet.

      Reviewer #2 (Recommendations for the authors):

      The authors have proved that ZSS has neuroprotective effects through rigorous animal experiments. However, ZSS contains other active substances besides jujuboside A, jujuboside B, and spinosin, which is more concerning. More critical data may be obtained if experiments have been designed to search for active substances.

      We appreciate this suggestion. We recognize that identifying the true active ingredients is a very important issue. Future studies will be designed to identify them and elucidate their mechanism of action.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General responses:

      The authors sincerely thank all the reviewers for their valuable and constructive comments. We also apologize for the long delay in providing this rebuttal due to logistical and funding challenges. In this revision, we modified the bipolar gradients from one single direction to all three directions. Additionally, in response to the concerns regarding data reliability, we conducted a thorough examination of each step in our data processing pipeline. In the original processing workflow, the projection-onto-convex-set (POCS) method was used for partial Fourier reconstruction. Upon examination, we found that applying the POCS method after parallel image reconstruction significantly altered the signal and resulted in considerable loss of functional feature. Futhermore, the original scan protocol employed a TE of 46 ms, which is notably longer than the typical TE of 33 ms. A prolonged TE can increase the ratio of extravascular to intravascular contributions. Importantly, the impact of TE on the efficacy of phase regression remains unclear, introducing potential confounding effects. To address these issues, we revised the protocol by shortening the TE from 46 ms to 39 ms. This adjustment was achieved by modifying the SMS factor to 3 and the in-plane acceleration rate to 3, thereby minimizing the confounding effects associated with an extended TE.

      Following these changes, we recollected task-based fMRI data (N=4) and resting-state fMRI data (N=14) under the updated protocol. Using the revised dataset, we validated layer-specific functional connectivity (FC) through seed-based analyses. These analyses revealed distinct connectivity patterns in the superficial and deep layers of the primary motor cortex (M1), with statistically significant inter-layer differences. Furthermore, additional analyses with a seed in the primary sensory cortex (S1) corroborated the robustness and reliability of the revised methodology. We also changed the ‘directed’ functional connectivity in the title to ‘layer-specific’ functional connectivity, as drawing conclusions about directionality requires auxiliary evidence beyond the scope of this study.

      We provide detailed responses to the reviewers’ comments below.

      Reviewer #1 (Public Review):

      Summary:

      (1)   This study aims to provide imaging methods for users of the field of human layer-fMRI. This is an emerging field with 240 papers published so far. Different than implied in the manuscript, 3T is well represented among those papers. E.g. see the papers below that are not cited in the manuscript. Thus, the claim on the impact of developing 3T methodology for wider dissemination is not justified. Specifically, because some of the previous papers perform whole brain layer-fMRI (also at 3T) in more efficient, and more established procedures.

      3T layer-fMRI papers that are not cited:

      Taso, M., Munsch, F., Zhao, L., Alsop, D.C., 2021. Regional and depth-dependence of cortical blood-flow assessed with high-resolution Arterial Spin Labeling (ASL). Journal of Cerebral Blood Flow and Metabolism. https://doi.org/10.1177/0271678X20982382

      Wu, P.Y., Chu, Y.H., Lin, J.F.L., Kuo, W.J., Lin, F.H., 2018. Feature-dependent intrinsic functional connectivity across cortical depths in the human auditory cortex. Scientific Reports 8, 1-14. https://doi.org/10.1038/s41598-018-31292-x

      Lifshits, S., Tomer, O., Shamir, I., Barazany, D., Tsarfaty, G., Rosset, S., Assaf, Y., 2018. Resolution considerations in imaging of the cortical layers. NeuroImage 164, 112-120. https://doi.org/10.1016/j.neuroimage.2017.02.086

      Puckett, A.M., Aquino, K.M., Robinson, P.A., Breakspear, M., Schira, M.M., 2016. The spatiotemporal hemodynamic response function for depth-dependent functional imaging of human cortex. NeuroImage 139, 240-248. https://doi.org/10.1016/j.neuroimage.2016.06.019

      Olman, C.A., Inati, S., Heeger, D.J., 2007. The effect of large veins on spatial localization with GE BOLD at 3 T: Displacement, not blurring. NeuroImage 34, 1126-1135. https://doi.org/10.1016/j.neuroimage.2006.08.045

      Ress, D., Glover, G.H., Liu, J., Wandell, B., 2007. Laminar profiles of functional activity in the human brain. NeuroImage 34, 74-84. https://doi.org/10.1016/j.neuroimage.2006.08.020

      Huber, L., Kronbichler, L., Stirnberg, R., Ehses, P., Stocker, T., Fernández-Cabello, S., Poser, B.A., Kronbichler, M., 2023. Evaluating the capabilities and challenges of layer-fMRI VASO at 3T. Aperture Neuro 3. https://doi.org/10.52294/001c.85117

      Scheeringa, R., Bonnefond, M., van Mourik, T., Jensen, O., Norris, D.G., Koopmans, P.J., 2022. Relating neural oscillations to laminar fMRI connectivity in visual cortex. Cerebral Cortex. https://doi.org/10.1093/cercor/bhac154

      We thank the reviewer for listing out 8 papers related to 3T layer-fMRI papers. The primary goal of our work is to develop a methodology for brain-wide, layer-dependent resting-state functional connectivity at 3T. Upon review of the cited papers, we found that:

      (1) One study (Lifshits et al.) was not an fMRI study.

      (2) One study (Olman et al.) was conducted at 7T, not 3T.

      (3) Two studies (Taso et al. and Wu et al.) employed relatively large voxel sizes (1.6 × 2.3 × 5 mm³ and 1.5 mm isotropic, respectively), which limits layer specificity.

      (4) Only one of the listed studies (Huber et al., Aperture Neuro 2023) provides coverage of more than half of the brain.

      While each of these studies offers valuable insights, the VASO study by Huber et al. is the most relevant to our work, given its brain-wide coverage. However, the VASO method employs a relatively long TR (14.137 s), which may not be optimal for resting-state functional connectivity analyses.

      To address these limitations, our proposed method achieves submillimeter resolution, layer specificity, brain-wide coverage, and a significantly shorter TR (<5 s) altogether. We believe this advancement provides a meaningful contribution to the field, enabling broader applicability of layer-fMRI at 3T.

      (2) The authors implemented a sequence with lots of nice features. Including their own SMS EPI, diffusion bipolar pulses, eye-saturation bands, and they built their own reconstruction around it. This is not trivial. Only a few labs around the world have this level of engineering expertise. I applaud this technical achievement. However, I doubt that any of this is the right tool for layer-fMRI, nor does it represent an advancement for the field. In the thermal noise dominated regime of sub-millimeter fMRI (especially at 3T), it is established to use 3D readouts over 2D (SMS) readouts. While it is not trivial to implement SMS, the vendor implementations (as well as the CMRR and MGH implementations) are most widely applied across the majority of current fMRI studies already. The author's work on this does not serve any previous shortcomings in the field.

      We would like to thank the reviewer for their comments and the recognition of the technical efforts in implementing our sequence. We would like to address the points raised:

      (1) We completely agree that in-house implementation of existing techniques does not constitute an advancement for the field. We did not claim otherwise in the manuscript. Our focus was on the development of a method for brain-wide, layer-dependent resting-state functional connectivity at 3T, as mentioned in the response above.

      (2) The reviewer stated that "it is established to use 3D readouts over 2D (SMS) readouts". This is a strong claim, and we believe it requires robust evidence to support it. While it is true that 3D readouts can achieve higher tSNR in certain regions, such as the central brain, as shown in the study by Vizioli et al. (ISMRM 2020 abstract; https://cds.ismrm.org/protected/20MProceedings/PDFfiles/3825.html?utm_source=chatgpt.com ), higher tSNR does not necessarily equate to improved detection power in fMRI studies. For instance, Le Ster et al. (PLOS ONE, 2019; https://doi.org/10.1371/journal.pone.0225286 ). demonstrated that while 3D EPI had higher tSNR in the central brain, SMS EPI produced higher t-scores in activation maps.

      (3) When choosing between SMS EPI and 3D EPI, multiple factors should be taken into account, not just tSNR. For example, SMS EPI and 3D EPI differ in their sensitivity to motion and the complexity of motion correction. The choice between them depends on the specific research goals and practical constraints.

      (4) We are open to different readout strategies, provided they can be demonstrated suitable to the research goals. In this study, we opted for 2D SMS primarily due to logistical considerations. This choice does not preclude the potential use of 3D readouts in the future if they are deemed more appropriate for the project objectives.

      The mechanism to use bi-polar gradients to increase the localization specificity is doubtful to me. In my understanding, killing the intra-vascular BOLD should make it less specific. Also, the empirical data do not suggest a higher localization specificity to me.

      We will elaborate the mechanism and reasoning in the later responses.

      Embedding this work in the literature of previous methods is incomplete. Recent trends of vessel signal manipulation with ABC or VAPER are not mentioned. Comparisons with VASO are outdated and incorrect.

      The reproducibility of the methods and the result is doubtful (see below).

      In this revision, we updated the scan protocol and recollected the imaging data. Detailed explanations and revised results are provided in the later responses.

      I don't think that this manuscript is in the top 50% of the 240 layer-fmri papers out there.

      We respect the reviewer’s personal opinion. However, we can only address scientific comments or critiques.

      Strengths:

      See above. The authors developed their own SMS sequence with many features. This is important to the field. And does not leave sequence development work to view isolated monopoly labs. This work democratises SMS.

      The questions addressed here are of high relevance to the field: getting tools with good sensitivity, user-friendly applicability, and locally specific brain activity mapping is an important topic in the field of layer-fMRI.

      Weaknesses:

      (1) I feel the authors need to justify why flow-crushing helps localization specificity. There is an entire family of recent papers that aim to achieve higher localization specificity by doing the exact opposite. Namely, MT or ABC fRMRI aims to increase the localization specificity by highlighting the intravascular BOLD by means of suppressing non-flowing tissue. To name a few:

      Priovoulos, N., de Oliveira, I.A.F., Poser, B.A., Norris, D.G., van der Zwaag, W., 2023. Combining arterial blood contrast with BOLD increases fMRI intracortical contrast. Human Brain Mapping hbm.26227. https://doi.org/10.1002/hbm.26227.

      Pfaffenrot, V., Koopmans, P.J., 2022. Magnetization Transfer weighted laminar fMRI with multi-echo FLASH. NeuroImage 119725. https://doi.org/10.1016/j.neuroimage.2022.119725

      Schulz, J., Fazal, Z., Metere, R., Marques, J.P., Norris, D.G., 2020. Arterial blood contrast ( ABC ) enabled by magnetization transfer ( MT ): a novel MRI technique for enhancing the measurement of brain activation changes. bioRxiv. https://doi.org/10.1101/2020.05.20.106666

      Based on this literature, it seems that the proposed method will make the vein problem worse, not better. The authors could make it clearer how they reason that making GE-BOLD signals more extra-vascular weighted should help to reduce large vein effects.

      The proposed VN fMRI method employs VN gradients to selectively suppress signals from fast-flowing blood in large vessels. Although this approach may initially appear to diverge from the principles of CBV-based techniques (Chai et al., 2020; Huber et al., 2017a; Pfaffenrot and Koopmans, 2022; Priovoulos et al., 2023), which enhance sensitivity to vascular changes in arterioles, capillaries, and venules while attenuating signals from static tissue and large veins, it aligns with the fundamental objective of all layer-specific fMRI methods. Specifically, these approaches aim to maximize spatial specificity by preserving signals proximal to neural activation sites and minimizing contributions from distal sources, irrespective of whether the signals are intra- or extra-vascular in origin. In the context of intravascular signals, CBV-based methods preferentially enhance sensitivity to functional changes in small vessels (proximal components) while demonstrating reduced sensitivity to functional changes in large vessels (distal components). For extravascular signals, functional changes are a mixture of proximal and distal influences. While tissue oxygenation near neural activation sites represents a proximal contribution, extravascular signal contamination from large pial veins reflects distal effects that are spatially remote from the site of neuronal activity. CBV-based techniques mitigate this challenge by unselectively suppressing signals from static tissues, thereby highlighting contributions from small vessels. In contrast, the VN fMRI method employs a targeted suppression strategy, selectively attenuating signals from large vessels (distal components) while preserving those from small vessels (proximal components). Furthermore, the use of a 3T scanner and the inclusion of phase regression in the VN approach mitigates contamination from large pial veins (distal components) while preserving signals reflecting local tissue oxygenation (proximal components). By integrating these mechanisms, VN fMRI improves spatial specificity, minimizing both intravascular and extravascular contributions that are distal to neuronal activation sites. We have incorporated the responses into Discussion section.

      The empirical evidence for the claim that flow crushing helps with the localization specificity should be made clearer. The response magnitude with and without flow crushing looks pretty much identical to me (see Fig, 6d).

      In the new results in Figure 4, the application of VN gradients attenuated the bias towards pial surface. Consistent with the results in Figure 4, Figure 5 also demonstrated the suppression of macrovascular signal by VN gradients.

      It's unclear to me what to look for in Fig. 5. I cannot discern any layer patterns in these maps. It's too noisy. The two maps of TE=43ms look like identical copies from each other. Maybe an editorial error?

      In this revision, the original Figure 5 has been removed. However, we would like to clarify that the two maps with TE = 43 ms in the original Figure 5 were not identical. This can be observed in the difference map provided in the right panel of the figure.

      The authors discuss bipolar crushing with respect to SE-BOLD where it has been previously applied. For SE-BOLD at UHF, a substantial portion of the vein signal comes from the intravascular compartment. So I agree that for SE-BOLD, it makes sense to crush the intravascular signal. For GE-BOLD however, this reasoning does not hold. For GE-BOLD (even at 3T), most of the vein signal comes from extravascular dephasing around large unspecific veins, and the bipolar crushing is not expected to help with this.

      The reviewer’s statement that "most of the vein signal comes from extravascular dephasing around large unspecific veins" may hold true for 7T. However, at 3T, the susceptibility-induced Larmor frequency shift is reduced by 57%, and the extravascular contribution decreases by more than 35%, as shown by Uludağ et al. 2009 ( DOI: 10.1016/j.neuroimage.2009.05.051 ).

      Additionally, according to the biophysical models (Ogawa et al., 1993; doi: 10.1016/S0006-3495(93)81441-3 ), the extravascular contamination from the pial surface is inversely proportional to the square of the distance from vessel. For a vessel diameter of 0.3 mm and an isotropic voxel size of 0.9 mm, the induced frequency shift is reduced by at least 36-fold at the next voxel. Notably, a vessel diameter of 0.3 mm is larger than most pial vessels. Theoretically, the extravascular effect contributes minimally to inter-layer dependency, particularly at 3T compared to 7T due to weaker susceptibility-related effects at lower field strengths. Empirically, as shown in Figure 7c, the results at M1 demonstrated that layer specificity can be achieved statistically with the application of VN gradients. We have incorporated this explanation into the Introduction and Discussion sections of the manuscript.

      (2) The bipolar crushing is limited to one single direction of flow. This introduces a lot of artificial variance across the cortical folding pattern. This is not mentioned in the manuscript. There is an entire family of papers that perform layer-fmri with black-blood imaging that solves this with a 3D contrast preparation (VAPER) that is applied across a longer time period, thus killing the blood signal while it flows across all directions of the vascular tree. Here, the signal cruising is happening with a 2D readout as a "snap-shot" crushing. This does not allow the blood to flow in multiple directions.

      VAPER also accounts for BOLD contaminations of larger draining veins by means of a tag-control sampling. The proposed approach here does not account for this contamination.

      Chai, Y., Li, L., Huber, L., Poser, B.A., Bandettini, P.A., 2020. Integrated VASO and perfusion contrast: A new tool for laminar functional MRI. NeuroImage 207, 116358. https://doi.org/10.1016/j.neuroimage.2019.116358

      Chai, Y., Liu, T.T., Marrett, S., Li, L., Khojandi, A., Handwerker, D.A., Alink, A., Muckli, L., Bandettini, P.A., 2021. Topographical and laminar distribution of audiovisual processing within human planum temporale. Progress in Neurobiology 102121. https://doi.org/10.1016/j.pneurobio.2021.102121

      If I would recommend anyone to perform layer-fMRI with blood crushing, it seems that VAPER is the superior approach. The authors could make it clearer why users might want to use the unidirectional crushing instead.

      We understand the reviewer’s concern regarding the directional limitation of bipolar crushing. As noted in the responses above, we have updated the bipolar gradient to include three orthogonal directions instead of a single direction. Furthermore, flow-related signal suppression does not necessarily require a longer time period. Bipolar diffusion gradients have been effectively used to nullify signals from fast-flowing blood, as demonstrated by Boxerman et al. (1995; DOI: 10.1002/mrm.1910340103). Their study showed that vessels with flow velocities producing phase changes greater than p radians due to bipolar gradients experience significant signal attenuation. The critical velocity for such attenuation can be calculated using the formula: 1/(2gGDd) where g is the gyromagnetic ratio, G is the gradient strength, d is the gradient pulse width and D is the time between the two bipolar gradient pulses. In the framework of Boxerman et al. at 1.5T, the critical velocity for b value of 10 s/mm<sup>2</sup> is ~8 mm/s, resulting in a ~30% reduction in functional signal. In our 3T study, b values of 6, 7, and 8 s/mm<sup>2</sup> correspond to critical velocities of 16.8, 15.2, and 13.9 mm/s, respectively. The flow velocities in capillaries and most venules remain well below these thresholds. Notably, in our VN fMRI sequences, bipolar gradients were applied in all three orthogonal directions, whereas in Boxerman et al.'s study, the gradients were applied only in the z-direction. Given the voxel dimensions of 3 × 3 × 7 mm<sup>3</sup> in the 1.5T study, vessels within a large voxel are likely oriented in multiple directions, meaning that only a subset of fast-flowing signals would be attenuated. Therefore, our approach is expected to induce greater signal reduction, even at the same b values as those used in Boxerman et al.'s study. We have incorporated this text into the Discussion section of the manuscript.

      (3) The comparison with VASO is misleading.

      The authors claim that previous VASO approaches were limited by TRs of 8.2s. The authors might be advised to check the latest literature of the last years.

      Koiso et al. performed whole brain layer-fMRI VASO at 0.8mm at 3.9 seconds (with reliable activation), 2.7 seconds (with unconvincing activation pattern, though), and 2.3 (without activation).

      Also, whole brain layer-fMRI BOLD at 0.5mm and 0.7mm has been previously performed by the Juelich group at TRs of 3.5s (their TR definition is 'fishy' though).

      Koiso, K., Müller, A.K., Akamatsu, K., Dresbach, S., Gulban, O.F., Goebel, R., Miyawaki, Y., Poser, B.A., Huber, L., 2023. Acquisition and processing methods of whole-brain layer-fMRI VASO and BOLD: The Kenshu dataset. Aperture Neuro 34. https://doi.org/10.1101/2022.08.19.504502

      Yun, S.D., Pais‐Roldán, P., Palomero‐Gallagher, N., Shah, N.J., 2022. Mapping of whole‐cerebrum resting‐state networks using ultra‐high resolution acquisition protocols. Human Brain Mapping. https://doi.org/10.1002/hbm.25855

      Pais-Roldan, P., Yun, S.D., Palomero-Gallagher, N., Shah, N.J., 2023. Cortical depth-dependent human fMRI of resting-state networks using EPIK. Front. Neurosci. 17, 1151544. https://doi.org/10.3389/fnins.2023.1151544

      We thank the reviewer for providing these references. While the protocol with a TR of 3.9 seconds in Koiso’s work demonstrated reasonable activation patterns, it was not tested for layer specificity. Given that higher acceleration factors (AF) can cause spatial blurring, a protocol should only be eligible for comparison if layer specificity is demonstrated.

      Secondly, the TRs reported in Koiso’s study pertain only to either the VASO or BOLD acquisition, not the combined CBV-based contrast. To generate CBV-based images, both VASO and BOLD data are required, effectively doubling the TR. For instance, if the protocol with a TR of 3.9 seconds is used, the effective TR becomes approximately 8 seconds. The stable protocol used by Koiso et al. to acquire whole-brain data (94.08 mm along the z-axis) required 5.2 seconds for VASO and 5.1 seconds for BOLD, resulting in an effective TR of 10.3 seconds. The spatial resolution achieved was 0.84 mm isotropic.

      Unfortunately, we could not find the Juelich paper mentioned by the reviewer.

      To have a more comprehensive comparison, we collated relevant literature on brain-wide layer-specific fMRI. We defined brain-wide acquisition as imaging protocols that cover more than half of the human brain, specifically exceeding 55 mm along the superior-inferior axis. We identified five studies and summarized their scan parameters, including effective TR, coverage, and spatial resolution, in Table 1.

      The authors are correct that VASO is not advised as a turn-key method for lower brain areas, incl. Hippocampus and subcortex. However, the authors use this word of caution that is intended for inexperienced "users" as a statement that this cannot be performed. This statement is taken out of context. This statement is not from the academic literature. It's advice for the 40+ user base that wants to perform layer-fMRI as a plug-and-play routine tool in neuroscience usage. In fact, sub-millimeter VASO is routinely being performed by MRI-physicists across all brain areas (including deep brain structures, hippocampus etc). E.g. see Koiso et al. and an overview lecture from a layer-fMRI workshop that I had recently attended: https://youtu.be/kzh-nWXd54s?si=hoIJjLLIxFUJ4g20&t=2401

      In this revision, we decided to focus on cortico-cortical functional connectivity and have removed the LGN-related content. Consequently, the text mentioned by the reviewer was also removed. Nevertheless, we apologize if our original description gave the impression that functional mapping of deep brain regions using VASO is not feasible. The word of caution we used is based on the layer-fMRI blog ( https://layerfmri.com/2021/02/22/vaso_ve/ ) and reflects the challenges associated with this technique, as outlined by experts like Dr. Huber and Dr. Strinberg.

      According to the information provided, including the video, functional mapping of the hippocampus and amygdala using VASO is indeed possible but remains technically challenging. The short arterial arrival times in these deep brain regions can complicate the acquisition, requiring RF inversion pulses to cover a wider area at the base of the brain. For example, as of 2023, four or more research groups were attempting to implement layer-fMRI VASO in the hippocampus. One such study at 3T required multiple inversion times to account for inflow effects, highlighting the technical complexity of these applications. This is the context in which we used the word of caution. We are not sure whether recent advancements like MAGEC VASO have improved its applicability. As of 2024, we have not identified any published VASO studies specifically targeting deep brain structures such as the hippocampus or amygdala. Therefore, it is difficult to conclude that “sub-millimeter VASO is routinely being performed by MRI physicists on deep brain structures such as the hippocampus.”

      Thus, the authors could embed this phrasing into the context of their own method that they are proposing in the manuscript. E.g. the authors could state whether they think that their sequence has the potential to be disseminated across sites, considering that it requires slow offline reconstruction in Matlab?

      We are enthusiastic about sharing our imaging sequence, provided its usefulness is conclusively established. However, it's important to note that without an online reconstruction capability, such as the ICE, the practical utility of the sequence may be limited. Unfortunately, we currently don’t have the manpower to implement the online reconstruction. Nevertheless, we are more than willing to share the offline reconstruction codes upon request.

      Do the authors think that the results shown in Fig. 6c are suggesting turn-key acquisition of a routine mapping tool? In my humble opinion, it looks like random noise, with most of the activation outside the ROI (in white matter).

      As we mentioned in the ‘general response’ in the beginning of the rebuttal, the POCS method for partial Fourier reconstruction caused the loss of functional feature, potentially accounting for the activation in white matter. In this revision, we have modified the pulse sequence, scan protocol and processing pipelines.

      According to the results in Figure 4, stable activation in M1 was observed at the single-subject level across most scan protocols. Yet, the layer-dependent activation profiles in M1 were spatially unstable, irrespective of the application of VN gradients. This spatial instability is not entirely unexpected, as T2*-based contrast is inherently sensitive to various factors that perturb the magnetic field, such as eye movements, respiration, and macrovascular signal fluctuations. Furthermore, ICA-based artifact removal was intentionally omitted in Figure 4 to ensure fair comparisons between protocols, leaving residual artifacts unaddressed. Inconsistency in performing the button-pressing task across sessions may also have contributed to the observed variability. These results suggest that submillimeter-resolution fMRI may not yet be suitable for reliable individual-level layer-dependent functional mapping, unless group-level statistics are incorporated to enhance robustness. We have incorporated this text into the Limitation section of the manuscript.

      (4) The repeatability of the results is questionable.

      The authors perform experiments about the robustness of the method (line 620). The corresponding results are not suggesting any robustness to me. In fact, the layer profiles in Fig. 4c vs. Fig 4d are completely opposite. The location of peaks turns into locations of dips and vice versa.

      The methods are not described in enough detail to reproduce these results.

      The authors mention that their image reconstruction is done "using in-house MATLAB code" (line 634). They do not post a link to github, nor do they say if they share this code.

      We thank the reviewer for the comments regarding reproducibility and data sharing. In response, we have revised the Methods section and elaborated on the technical details to improve clarity and reproducibility.

      Regarding code sharing, we acknowledge that the current in-house MATLAB reconstruction code requires further refinement to improve its readability and usability. Due to limited manpower, we have not yet been able to complete this task. However, we are committed to making the code publicly available and will upload it to GitHub as soon as the necessary resources are available.

      For data sharing, we face logistical challenges due to the large size of the dataset, which spans tens of terabytes. Platforms like OpenNeuro, for example, typically support datasets up to 10TB, making it difficult to share the data in its entirety. Despite this limitation, we are more than willing to share offline reconstruction codes and raw data upon request to facilitate reproducibility.

      Regarding data robustness, we kindly refer the reviewer to our response to the previous comment, where we addressed these concerns in greater detail.

      It is not trivial to get good phase data for fMRI. The authors do not mention how they perform the respective coil-combination.

      No data are shared for reproduction of the analysis.

      Obtaining phase data is relatively straightforward when the images are retrieved directly from raw data. For coil combination, we employed the adaptive coil combination approach described by (Walsh et al.; DOI: 10.1002/(sici)1522-2594(200005)43:5<682::aid-mrm10>3.0.co;2-g ) The MATLAB code for this implementation was developed by Dr. Diego Hernando and is publicly available at https://github.com/welton0411/matlab .

      (5) The application of NODRIC is not validated.

      Previous applications of NORDIC at 3T layer-fMRI have resulted in mixed success. When not adjusted for the right SNR regime it can result in artifactual reductions of beta scores, depending on the SNR across layers. The authors could validate their application of NORDIC and confirm that the average layer-profiles are unaffected by the application of NORDIC. Also, the NORDIC version should be explicitly mentioned in the manuscript.

      Akbari, A., Gati, J.S., Zeman, P., Liem, B., Menon, R.S., 2023. Layer Dependence of Monocular and Binocular Responses in Human Ocular Dominance Columns at 7T using VASO and BOLD (preprint). Neuroscience. https://doi.org/10.1101/2023.04.06.535924

      Knudsen, L., Guo, F., Huang, J., Blicher, J.U., Lund, T.E., Zhou, Y., Zhang, P., Yang, Y., 2023. The laminar pattern of proprioceptive activation in human primary motor cortex. bioRxiv. https://doi.org/10.1101/2023.10.29.564658

      We appreciate the reviewer’s suggestion. To validate the application of NORDIC denoising in our study, we compared the BOLD activation maps before and after denoising in the visual and motor cortices, as well as the depth-dependent activation profiles in M1. These results are presented in Figure 3. The activation patterns in the denoised maps were consistent with those in the non-denoised maps but exhibited higher statistical significance. Notably, BOLD activation within M1 was only observed after NORDIC denoising, underscoring the necessity of this approach. Figure 3c shows the depth-dependent activation profiles in M1, highlighted by the green contours in Figure 3b. Both denoised and non-denoised profiles followed similar trends; however, as expected, the non-denoised profile exhibited larger confidence intervals compared to the NORDIC-denoised profile. These results confirm that NORDIC denoising enhances sensitivity without introducing distortions in the functional signal. The corresponding text has been incorporated into the Results section.

      Regarding the implementation details of NORDIC denoising, the reconstructed images were denoised using a g-factor map (function name: NIFTI_NORDIC). The g-factor map was estimated from the image time series, and the input images were complex-valued. The width of the smoothing filter for the phase was set to 10, while all other hyperparameters were retained at their default values. This information has been integrated into the Methods section for clarity and reproducibility.

      Reviewer #2 (Public Review):

      This study developed a setup for laminar fMRI at 3T that aimed to get the best from all worlds in terms of brain coverage, temporal resolution, sensitivity to detect functional responses, and spatial specificity. They used a gradient-echo EPI readout to facilitate sensitivity, brain coverage and temporal resolution. The former was additionally boosted by NORDIC denoising and the latter two were further supported by parallel-imaging acceleration both in-plane and across slices. The authors evaluated whether the implementation of velocity-nulling (VN) gradients could mitigate macrovascular bias, known to hamper the laminar specificity of gradient-echo BOLD.

      The setup allows for 0.9 mm isotropic acquisitions with large coverage at a reasonable TR (at least for block designs) and the fMRI results presented here were acquired within practical scan-times of 12-18 minutes. Also, in terms of the availability of the method, it is favorable that it benefits from lower field strength (additional time for VN-gradient implementation, afforded by longer gray matter T2*).

      The well-known double peak feature in M1 during finger tapping was used as a test-bed to evaluate the spatial specificity. They were indeed able to demonstrate two distinct peaks in group-level laminar profiles extracted from M1 during finger tapping, which was largely free from superficial bias. This is rather intriguing as, even at 7T, clear peaks are usually only seen with spatially specific non-BOLD sequences. This is in line with their simple simulations, which nicely illustrated that, in theory, intravascular macrovascular signals should be suppressible with only minimal suppression of microvasculature when small b-values of the VN gradients are employed. However, the authors do not state how ROIs were defined making the validity of this finding unclear; were they defined from independent criteria or were they selected based on the region mostly expressing the double peak, which would clearly be circular? In any case, results are based on a very small sub-region of M1 in a single slice - it would be useful to see the generalizability of superficial-bias-free BOLD responses across a larger portion of M1.

      We appreciate and understand the reviewer’s concerns. Given the small size of the hand knob region within M1 and its intersubject variability in location, defining this region automatically remains challenging. However, we applied specific criteria to minimize bias during the delineation of M1: 1) the hand knob region was required to be anatomically located in the precentral sulcus or gyrus; 2) it needed to exhibit consistent BOLD activation across the majority of testing conditions; and 3) the region was expected to show BOLD activation in the deep cortical layers under the condition of b = 0 and TE = 30 ms. Once the boundaries across cortical depth were defined, the gray matter boundaries of hand knob region were delineated based on the T1-weighted anatomical image and the cortical ribbon mask but excluded the BOLD activation map to minimize potential bias in manual delineation. Based on the new criteria, the resulting depth-dependent profiles, as shown in Figure 4, are no longer superficial-bias-free.

      As repeatedly mentioned by the authors, a laminar fMRI setup must demonstrate adequate functional sensitivity to detect (in this case) BOLD responses. The sensitivity evaluation is unfortunately quite weak. It is mainly based on the argument that significant activation was found in a challenging sub-cortical region (LGN). However, it was a single participant, the activation map was not very convincing, and the demonstration of significant activation after considerable voxel-averaging is inadequate evidence to claim sufficient BOLD sensitivity. How well sensitivity is retained in the presence of VN gradients, high acceleration factors, etc., is therefore unclear. The ability of the setup to obtain meaningful functional connectivity results is reassuring, yet, more elaborate comparison with e.g., the conventional BOLD setup (no VN gradients) is warranted, for example by comparison of tSNR, quantification and comparison of CNR, illustration of unmasked-full-slice activation maps to compare noise-levels, comparison of the across-trial variance in each subject, etc. Furthermore, as NORDIC appears to be a cornerstone to enable submillimeter resolution in this setup at 3T, it is critical to evaluate its impact on the data through comparison with non-denoised data, which is currently lacking.

      We appreciate the reviewer’s comments and acknowledge that the LGN results from a single participant were not sufficiently convincing. In this revision, we have removed the LGN-related results and focused on cortico-cortical FC. To evaluate data quality, we opted to present BOLD activation maps rather than tSNR, as high tSNR does not necessarily translate to high functional significance. In Figure 3, we illustrate the effect of NORDIC denoising, including activation maps and depth-dependent profiles. Figure 4 presents activation maps acquired under different TE and b values, demonstrating that VN gradients effectively reduce the bias toward the pial surface without altering the overall activation patterns. The results in Figure 4 and Figure 5 provide evidence that VN gradients retain sensitivity while reducing superficial bias. The ability of the setup to obtain meaningful FC results was validated through seed-based analyses, identifying distinct connectivity patterns in the superficial and deep layers of the primary motor cortex (M1), with significant inter-layer differences (see Figure 7). Further analyses with a seed in the primary sensory cortex (S1) demonstrated the reliability of the method (see Figure 8). For further details on the results, including the impact of VN gradients and NORDIC denoising, please refer to Figures 3 to 8 in the Results section.

      Additionally, we acknowledge the limitations of our current protocol for submillimeter-resolution fMRI at the individual level. We found that robust layer-dependent functional mapping often requires group-level statistics to enhance reliability. This issue has been discussed in detail in the Limitations section.

      The proposed setup might potentially be valuable to the field, which is continuously searching for techniques to achieve laminar specificity in gradient echo EPI acquisitions. Nonetheless, the above considerations need to be tackled to make a convincing case.

      Reviewer #3 (Public Review):

      Summary:

      The authors are looking for a spatially specific functional brain response to visualise non-invasively with 3T (clinical field strength) MRI. They propose a velocity-nulled weighting to remove the signal from draining veins in a submillimeter multiband acquisition.

      Strengths:

      - This manuscript addresses a real need in the cognitive neuroscience community interested in imaging responses in cortical layers in-vivo in humans.

      - An additional benefit is the proposed implementation at 3T, a widely available field strength.

      Weaknesses:

      - Although the VASO acquisition is discussed in the introduction section, the VN-sequence seems closer to diffusion-weighted functional MRI. The authors should make it more clear to the reader what the differences are, and how results are expected to differ. Generally, it is not so clear why the introduction is so focused on the VASO acquisition (which, curiously, lacks a reference to Lu et al 2013). There are many more alternatives to BOLD-weighted imaging for fMRI. CBF-weighted ASL and GRASE have been around for a while, ABC and double-SE have been proposed more recently.

      The major distinction between diffusion-weighted fMRI (DW-fMRI) and our methodology lies in the b-value employed. DW-fMRI typically measures cellular swelling using b-values greater than 1000 s/mm<sup>2</sup> (e.g., 1800 s/mm(sup>2</sup>). In contrast, our VN-fMRI approach measures hemodynamic responses by employing smaller b-values specifically designed to suppress signals from fast-flowing draining veins rather than detecting microstructural changes.

      Regarding other functional contrasts, we agree that more layer-dependent fMRI approaches should be mentioned. In this revision, we have expanded the Introduction section to include discussions of the double spin-echo approach and CBV-based methods, such as MT-weighted fMRI, VAPER, ABC, and CBF-based method ASL. Additionally, the reference to Lu et al. (2013) has been cited in the revised manuscript. The corresponding text has been incorporated into the Introduction section to provide a more comprehensive overview of alternative functional imaging techniques.

      - The comparison in Figure 2 for different b-values shows % signal changes. However, as the baseline signal changes dramatically with added diffusion weighting, this is rather uninformative. A plot of t-values against cortical depth would be much more insightful.

      - Surprisingly, the %-signal change for a b-value of 0 is not significantly different from 0 in the gray matter. This raises some doubts about the task or ROI definition. A finger-tapping task should reliably engage the primary motor cortex, even at 3T, and even in a single participant.

      - The BOLD weighted images in Figure 3 show a very clear double-peak pattern. This contradicts the results in Figure 2 and is unexpected given the existing literature on BOLD responses as a function of cortical depth.

      - Given that data from Figures 2, 3, and 4 are derived from a single participant each, order and attention affects might have dramatically affected the observed patterns. Especially for Figure 4, neither BOLD nor VN profiles are really different from 0, and without statistical values or inter-subject averaging, these cannot be used to draw conclusions from.

      We appreciate the reviewer’s suggestions. In this revision, we have made significant updates to the participant recruitment, scan protocol, data processing, and M1 delineation. Please refer to the "General Responses" at the beginning of the rebuttal and the first response to Reviewer #2 for more details.

      Previously, the variation in depth-dependent profiles was calculated across upscaled voxels within a specific layer. However, due to the small size of the hand knob region, the number of within-layer voxels was limited, resulting in inaccurate estimations of signal variation. In the revised manuscript, the signal was averaged within each layer before performing the GLM analysis, and signal variation was calculated using the temporal residuals. The technical details of these changes are described in the "Materials and Methods" section. Furthermore, while the initial submission used percentage signal change for the profiles of M1, the dramatic baseline fluctuations observed previously are no longer an issue after the modifications. For this reason, we retained the use of percentage signal change to present the depth-dependent profiles. After these adjustments, the profiles exhibited a bias toward the pial surface, particularly in the absence of VN gradients.

      - In Figure 5, a phase regression is added to the data presented in Figure 4. However, for a phase regression to work, there has to be a (macrovascular) response to start with. As none of the responses in Figure 4 are significant for the single participant dataset, phase regression should probably not have been undertaken. In this case, the functional 'responses' appear to increase with phase regression, which is contra-intuitive and deserves an explanation.

      We agreed with reviewer’s argument. In the revised results, the issues mentioned by the reviewer are largely diminished. The updated analyses demonstrate that phase regression effectively reduces superficial bias, as shown in Figures 4 and 5.

      - Consistency of responses is indeed expected to increase by a removal of the more variable vascular component. However, the microvascular component is always expected to be smaller than the combination of microvascular + macrovascular responses. Note that the use of %signal changes may obscure this effect somewhat because of the modified baseline. Another expected feature of BOLD profiles containing both micro- and microvasculature is the draining towards the cortical surface. In the profiles shown in Figure 7, this is completely absent. In the group data, no significant responses to the task are shown anywhere in the cortical ribbon.

      We agreed with reviewer’s comments. In the revised manuscript, the results have been substantially updated to addressing the concerns raised. The original Figure 7 is no longer relevant and has been removed.

      - Although I'd like to applaud the authors for their ambition with the connectivity analysis, I feel that acquisitions that are so SNR starved as to fail to show a significant response to a motor task should not be used for brain wide directed connectivity analysis.

      We appreciate the reviewer’s comments and share the concern about SNR limitations. In the updated results presented in Figure 5, the activation patterns in the visual cortex were consistent across TEs and b values. At the motor cortex, stable activation in M1 was observed at the single-subject level across most scan protocols. However, the layer-dependent activation profiles in M1 exhibited spatial instability, irrespective of the application of VN gradients. This spatial instability is not entirely unexpected, as T2*-based contrast is inherently sensitive to factors that perturb the magnetic field, such as eye movements, respiration, and macrovascular signal fluctuations. Additionally, ICA-based artifact removal was intentionally omitted in Figure 4 to ensure fair comparisons across protocols, leaving some residual artifacts unaddressed. Variability in task performance during button-pressing sessions may have further contributed to the observed inconsistencies.

      Although these findings suggest that submillimeter-resolution fMRI may not yet be reliable for individual-level layer-dependent functional mapping, the group-level FC analyses can still yield robust results. In Figure 7, group-level statistics revealed distinct functional connectivity (FC) patterns associated with superficial and deep layers in M1. These FC maps exhibited significant differences between layers, demonstrating that VN fMRI enhances inter-layer independence. Additional FC analyses with a seed placed in S1 further validated these findings (see Figure 8).

      The claim of specificity is supported by the observation of the double-peak pattern in the motor cortex, previously shown in multiple non-BOLD studies. However, this same pattern is shown in some of the BOLD weighted data, which seems to suggest that the double-peak pattern is not solely due to the added velocity nulling gradients. In addition, the well-known draining towards the cortical surface is not replicated for the BOLD-weighted data in Figures 3, 4, or 7. This puts some doubt about the data actually having the SNR to draw conclusions about the observed patterns.

      We appreciate the reviewer’s comments. In the updated results, the efficacy of the VN gradients is evident near the pial surface, as shown in Figures 4 and 5. In Figure 4, comparing the second and third columns (b = 0 and b = 6 s/mm<sup>2</sup>, respectively, at TE = 38 ms), the percentage signal change in the superficial layers is generally lower with b = 6 s/mm<sup>2</sup> than with b = 0. This indicates that VN gradient-induced signal suppression is more pronounced in the superficial layers. Additionally, in Figure 5, the VN gradients effectively suppressed macrovascular signals as highlighted by the blue circles. These observations support the role of VN gradients in enhancing specificity by reducing superficial bias and macrovascular contamination. Furthermore, bias towards cortical surface was observed in the updated results in Figure 4.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) L141: "depth dependent" is slightly misleading here. It could be misunderstood to suggest that the authors are assessing how spatial specificity varies as a function of depth. Rather, they are assessing spatial specificity based on depth-dependent responses (double peak feature). Perhaps "layer-dependent spatial specificity" could be substituted with laminar specificity?

      We thank the reviewer for the suggestion. The term “depth dependent” has been replaced by “layer dependent” in the revised manuscript.

      (2) L146-149: these do not validate spatial specificity.

      The original text is removed.

      (3) L180: Maybe helpful to describe what the b-value is to assist unfamiliar readers.

      We have clarified the b-value as “the strength of the bipolar diffusion gradients” where it is first mentioned in the manuscript.

      (4) Figure 1B: I think it would be appropriate with a sentence of how the authors define micro/macrovasculature. Figure 1B seems to suggest that large ascending veins are considered microvascular which I believe is a bit unconventional. Nevertheless, as long as it is clearly stated, it should be fine.

      In our context, macrovasculature refers to vessels that are distal to neural activation sites and contribute to extravascular contamination. These vessels are typically larger in size (e.g., > 0.1 mm in diameter) and exhibit faster flow rates (e.g., > 10 mm/s).

      (5) I think the authors could be more upfront with the point about non-suppressed extravascular effects from macrovasculature, which was briefly mentioned in the discussion. It could already be highlighted in the introduction or theory section.

      We thank the reviewer’s suggestions. We have expanded the discussion of extravascular effects from macrovasculature in both the Introduction (5th paragraph) and Discussion (3rd paragraph) sections.

      (6) The phase regression figure feels a bit misplaced to me. If the authors agree: rather than showing the TE-dependency of the effect of phase regression, it may be more relevant for the present study to compare the conventional setup with phase regression, with the VN setup without phase regression. I.e., to show how the proposed setup compares to existing 3T laminar fMRI studies.

      In this revision, both the TE-dependent and VN-dependent effects of phase regression were investigated. The results in Figure 4 and Figure 5 demonstrated that phase regression effectively suppresses macrovascular contributions primarily near the gray matter/CSF boundary, irrespective of TE or the presence of VN gradients.

      (7) L520: It might be beneficial to also cite the large body of other laminar studies showing the double peak feature to underscore that it is highly robust, which increases its relevance as a test-bed to assess spatial specificity.

      We agreed. More literatures have been cited (Chai et al., 2020; Huber et al., 2017a; Knudsen et al., 2023; Priovoulos et al., 2023).

      (8) L557: The argument that only one participant was assessed to reduce inter-subject variability is hard to buy. If significant variability exists across subjects, this would be highly relevant to the authors and something they would want to capture.

      We thank the reviewer for the suggestions. In this revision, we have increased the number of participants to 4 for protocol development and 14 for resting-state functional connectivity analysis, allowing us to better assess and account for inter-subject variability.

      (9) L637: add download link and version number.

      The download link has been added as requested. The version number is not applicable.

      (10) L638: How was the phase data coil-combined?

      The reconstructed multi-channel data, which were of complex values, were combined using the adaptive combination method (Walsh et al.; DOI: 10.1002/(sici)1522-2594(200005)43:5<682::aid-mrm10>3.0.co;2-g). The MATLAB code for this implementation was developed by Dr. Diego Hernando and is publicly available at https://github.com/welton0411/matlab . The phase data were then extracted using the MATLAB function ‘angle’.

      (11) L639: Why was the smoothing filter parameter changed (other parameters were default)?

      The smoothing filter parameter was set based on the suggestion provided in the help comments of the NIFTI_NORDIC function:

      function  NIFTI_NORDIC(fn_magn_in,fn_phase_in,fn_out,ARG)

      % fMRI

      %

      %  ARG.phase_filter_width=10;

      In other words, we simply followed the recommendation outlined in the NIFTI_NORDIC function’s documentation.

      (12) I assume the phase data was motion corrected after transforming to real and imaginary components and using parameters estimated from magnitude data? Maybe add a few sentences about this.

      Prior to phase regression, the time series of real and imaginary components were subjected to motion correction, followed by phase unwrapping. The phase regression was incorporated early in the data processing pipeline to minimize the discrepancy in data processing between magnitude and phase images (Stanley et al., 2021).

      (13) Was phase regression applied with e.g., a deming model, which accounts for noise on both the x and y variable? In my experience, this makes a huge difference compared with regular OLS.

      We appreciate the reviewer’s insightful comment. We are aware that the noise present in both magnitude and phase data therefore linear Deming regression would be a good fit to phase regression (Stanley et al., 2021). To perform Deming regression, however, the ratio of magnitude error variance to phase error variance must be predefined. In our initial tests, we found that the regression results were sensitive to this ratio. To avoid potential confounding, we opted to use OLS regression for the current analysis. However, we agreed Deming model could enhance the efficacy of phase regression if the ratio could be determined objectively and properly.

      (14) Figure 2: What is error bar reflecting? I don't think the across-voxel error, as also used in Figure 4, is super meaningful as it assumes the same response of all voxels within a layer (might be alright for such a small ROI). Would it be better to e.g. estimate single-trial response magnitude (percent signal change) and assess variability across? Also, it is not obvious to me why b=30 was chosen. The authors argue that larger values may kill signal, but based on this Figure in isolation, b=48 did not have smaller response magnitudes (larger if anything).

      We agreed with the reviewer’s opinion on the across-voxel error. In the revised manuscript, the signal was averaged within each layer before performing the GLM analysis, and signal variation was calculated using the temporal residuals. The technical details of these changes are described in the "Materials and Methods" section.

      Additionally, the bipolar diffusion gradients were modified from a single direction to three orthogonal directions. As a result, the questions and results related to b=30 or b=48 are no longer applicable.

      (15) Figure 5: would be informative to quantify the effect of phase regression over a large ROI and evaluate reduction in macrovascular influence from superficial bias in laminar profiles.

      We appreciate the reviewer’s suggestion. In the revised manuscript, the reduction in macrovascular influence from superficial bias across a large ROI is displayed in Figure 5. Additionally, the impact on laminar profiles is demonstrated in Figure 4.

      (16) L406-408: What kind of robustness?

      We acknowledge that describing the protocol as “robust” was an overstatement. The updated results indicate that the current protocol for submillimeter fMRI may not yet be suitable for reliable individual-level layer-dependent functional mapping. However, group-level functional connectivity (FC) analyses demonstrated clear layer-specific distinctions with VN fMRI, which were not evident in conventional fMRI. These findings highlight the enhanced layer specificity achievable with VN fMRI.

      (17) Figure 8: I think C) needs pointers to superficial, middle, and deep layers? Why is it not in the same format as in Figure 9C? The discussion of the FC results could benefit from more references supporting that these observations are in line with the literature.

      In the revised results, the layer pooling shown in Figure 9c has been removed, making the question regarding format alignment no longer applicable. Additionally, references supporting the FC results have been added to the revised Discussion section (7th paragraph).

      (18) L456-457: But correlation coefficients may also be biased by different CNR across layers.

      That is correct. In the updated FC results in Figure 7 to 9, we used group-level statistics rather than correlation coefficients.

      Reviewer #3 (Recommendations For The Authors):

      The results in Figure 2-6 should be repeated over, or averaged over, a (small) group of participants. N=6 is usual in this field. I would seriously reconsider the multiband acceleration - the acquisition seemingly cannot support the SNR hit.

      A few more specific points are given below:

      (1) Abstract: The sentence about LGN in the abstract came for me out of the blue - why would LGN be important here, it's not even a motor network node? Perhaps the aims of the study should be made more clear - if it's about networks as suggested earlier then a network analysis result would be expected too. Expanding the directed FC findings would improve the logical flow of the abstract. Given the many concerns, removing the connectivity analysis altogether would also be an option.

      We thank the reviewer for the suggestions. The LGN-related results indeed diluted the focus of this study and have been completely removed in this revision.

      (2) Line 105: in addition to the VASO method, ..

      The corresponding text has been revised, and as a result, the reviewer’s suggestion is no longer applicable.

      (3) If out of the set MB 4 / 5 / 6 MB4 was best, why did the authors not continue with a comparison including MB3 and MB2? It seems to me unlikely that the MB4 acquisition is actually optimal.

      Results: We appreciate the reviewer’s suggestions. In this revision, we decreased the MB factor to 3, as it allowed us to increase the in-plane acceleration rate to 3, thereby shortening the TE. The resulting sensitivity for both individual and group-level results is detailed in earlier responses, such as the response to Q16 for Reviewer #2.

      (4) The formatting of the references is occasionally flawed, including first names and/or initials. Please consider using a reliable reference manager.

      We used Zotero as our reference manager in this revision to ensure consistency and accuracy. The references have been formatted according to the APA style.

      (5) In the caption of Figure 5, corrected and uncorrected p values are identical. What multiple comparisons correction was made here? A multiple comparisions over voxels (as is standard) would usually lead to a cut-off ~z=3.2. That would remove most of the 'responses' shown in figure 5.

      We appreciate the reviewer’s comment. The original results presented in Figure 5 have been removed in the revised manuscript, making this comment no longer applicable.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors developed a mathematical model to predict human biological ages using physiological traits. This model provides a way to identify environmental and genetic factors that impact aging and lifespan.

      Strengths:

      (1) The topic addressed by the authors - human age predication using physiological traits - is an extremely interesting, important, and challenging question in the aging field. One of the biggest challenges is the lack of well-controlled data from a large number of humans. However, the authors took this challenge and tried their best to extract useful information from available data.

      Authors thank an anonymous reviewer for agreeing that physiological clock building and analysis is an interesting and important even though challenging task.

      (2) Some of the findings can provide valuable guidelines for future experimental design for human and animal studies. For example, it was found that this mathematical model can best predict age when all different organ and physiological systems are sampled. This finding makes sense in general but can be, and has been, neglected when people use molecular markers to predict age. Most of those studies have used only one molecular trait or different traits from one tissue.

      Authors thank an anonymous reviewer for highlighting the importance of the approach we employ to sample traits for biological age prediction from multiple organs and systems, which ultimately provides more wholistic information

      Weaknesses:

      (1) As I mentioned above, the Biobank data used here are not designed for this current study, so there are many limitations for model development using these data, e.g., missing data points and irrelevant measurements for aging. This is a common caveat for human studies and has been discussed by the authors.

      Thank you for pointing out the caveats. Indeed, most databases and datasets including the UKBB that we use here have missing or inaccurate entries. We do discuss it in the text, as well as suggest and employ strategies to mitigate these caveats. We now updated the text to highlight these issues even further. Specifically, in the second paragraph of the “Results” section, we added the following text: “Most large human databases and datasets, including UKBB, have certain limitations, such as incomplete or missing data points. Therefore, before proceeding to modelling aging, we needed to address the following three issues:”

      (2) There is no validation dataset to verify the proposed model. The authors suggested that human biological age can be predicted with high accuracy using 12 simple physiological measurements. It will be super useful and convincing if another biobank dataset containing those 12 traits can be applied to the current model.

      Thank you for this comment. Indeed, having a replication cohort would be quite valuable. As of today, there is no comparable dataset to verify performance of the clock model or to attempt to validate GWAS results. The closest possible is the NIH-led research program “All Of Us”, which aims to collect data on 1 million people, which unfortunately is not available to for-profit companies. It is theoretically possible to rebuild a clock only using a small number of phenotypes present in both datasets with the goal of training it on one dataset and test-applying it to another, but this won’t ultimately address the accuracy of the wholistic physiological clock presented here. We hope academic labs will utilize our clock-modeling approach and apply it to datasets currently unavailable to us and publish their findings.

      To strengthen the credentials of our biological clock, we would like to remind the reviewer that we performed 10 rounds of validation, where, in each round, 10% of the data were left out from the model training such that the clock was created using remaining 90%. The model was subsequently tested on the 10% that was left out. Over 10 rounds, different 10% of data were left out and statistics for this 10-fold cross-validation age available in the supplementary materials. We have now updated the text to make this validation more apparent.

      Specifically, we added to the "Results” section, “A mathematical model to predict age” subsection, third paragraph, the following text: “Specifically, we performed 10 rounds of cross-validation, where 10% of data were held out and the remaining 90% used for training. Over 10 rounds, different 10% were held out for validation. In each case, the findings were validated in the test set. Full statistics and approach are described in supplementary computational methods.”

      Additionally, the details of this cross-validation are described in detail in supplementary methods.

      Additionally, we compared published GWAS results obtained for human aging clocks using modalities that were different yet relevant to human health. Specifically, we looked at GWAS for “Epigenetic Blood Age Acceleration” (Lu et al., 2018), ML-imaging-based human retinal aging clock (Ahadi et al., 2023), PhenoAgeAcceleration and BioAgeAcceleration (Kuo et al., 2021), and the ∆Age GWAS that we presented in our manuscript. We now describe the results of this comparison in our manuscript. Briefly, there is no overlap between GWAS results for any two of these published clocks built via different modalities – retina, DNA methylation, or physiological functions (between each other or with our model). However, there is a significant genetic overlap (p<10E-8) between clocks built using human phenotypic measures in a cohort of National Health and Nutrition Examination Survey (NHANES) III in the United States (7 variables) and ∆Age from Physiological clock from UKBB that we describe here (121 variables), further validating our approach. It is interesting to consider the reasons why genetic associations for human aging built using different modalities do not appear to have common genetic corelates, something we also now discuss in our manuscript.

      Specifically, we added to the "Results” section, “Genetic loci associated with biological age” subsection, third paragraph, the following text: “Additionally, we compared our ∆Age GWAS association results with similar GWAS studies that were performed for other biological clocks. For example, (McCartney et al., 2021) used DNA methylation data on 40,000 individuals to compute biological age called GrimAge. After that they calculated an intrinsic epigenetic age acceleration (IEAA, a value similar to ∆Age, which measured a deviation of biological age from chronological age) and performed GWAS.” Additionally, we added to the “Discussion” section, “Broader implications of the model for physiological aging” subsection, fourth paragraph, the following text: “To further analyze the meaning of genetic associations with ∆Age that we described above, we compared several published GWAS results obtained for human aging clocks using different health modalities. Specifically, we looked at GWAS for “Epigenetic Blood Age Acceleration” (Lu et al., 2018), ML-imaging-based human retinal aging clock (Ahadi et al., 2023), PhenoAgeAcceleration and BioAgeAcceleration (Kuo et al., 2021), and the ∆Age GWAS we presented in our manuscript. Surprisingly, we discovered that there is no overlap between GWAS results for any two of these clocks built via different modalities – retina, DNA methylation, or physiological functions. However, there is a significant genetic overlap between clocks built using human phenotypic measures and our ∆Age model we describe. For example, the Biological Age Clock Acceleration calculated using HbA1c, Albumin, Cholesterol, FEV, Urea nitrogen, SBP, and Creatinine (Levine, 2013) in a US cohort [from National Health and Nutrition Examination Survey (NHANES)] yielded 16 significant hits in the GWAS analysis, five of which were also significant in our GWAS for UKBB based ∆Age. These five common loci were close to the following genes - APOB, PIK3CG, TRIB1, SMARCA4, and APOE. The significance of this overlap is p < 10<sup>-8</sup>, suggesting that the ∆Age model we propose might be translatable to other cohorts of people.

      An interesting question to consider is why GWAS results from other clock modalities, such as DNA methylation and retinal imaging do not yield any genetic similarities to each other or to physiological and biological clocks. It is possible that these modalities of age assessment depend on completely genetically independent biological processes. For example, in a simplified manner - blood composition might be heavily weighted for DNA methylation, vascular structure for retinal scans, and muscle/bone/kidney health for physiological clocks. Data from model organisms suggest the master regulators of aging exist, and APOE is the best genetic variant known to influence human aging. Interestingly, only the biological and physiological clock models that we propose here pick it up as a hit. Alternatively, it is also possible that the true master regulators of aging rate are under stringent purifying selection; for example, due to an important role in development, and therefore, do not have genetic variability in human populations examined. As such, they could not be identified as hits in any GWAS studies.”

      Reviewer #2 (Public Review):

      In this manuscript, Libert et al. develop a model to predict an individual's age using physiological traits from multiple organ systems. The difference between the predicted biological age and the chronological age -- ∆Age, has an effect equivalent to that of a chronological year on Gompertz mortality risk. By conducting GWAS on ∆Age, the authors identify genetic factors that affect aging and distinguish those associated with age-related diseases. The study also uncovers environmental factors and employs dropout analysis to identify potential biomarkers and drivers for ∆Age. This research not only reveals new factors potentially affecting aging but also shows promise for evaluating therapeutics aimed at prolonging a healthy lifespan. This work represents a significant advancement in data-driven understanding of aging and provides new insights into human aging. Addressing the points raised would enhance its scientific validity and broaden its implications.

      Thank you!

      Major points:

      (1) Enhance the description and clarity of model evaluation.

      The manuscript requires additional details regarding the model's evaluation. The authors have stated "To develop a model that predicts age, we experimented with several algorithms, including simple linear regression, Gradient Boosting Machine (GBM) and Partial Least Squares regression (PLS). The outcomes of these approaches were almost identical". It is currently unclear whether the 'almost identical outcomes' mentioned refer to the similarity in top contribution phenotypes, the accuracy of age prediction, or both. To resolve this ambiguity, it would be beneficial to include specific results and comparisons from each of these models.

      Thank you for this comment. We now describe details of the model selection and provide data on outcome caparisons. Briefly, different approaches have different advantages and limitations; however, we chose one approach, and did not develop and analyze several independent models in parallel in order to not artificially inflate our False Discovery Rate (FDR). However, we now provide rationale and comparative performance of these three approaches. Specifically, we added to the "Results” section, “A mathematical model to predict age” subsection, first paragraph the following text: “Different approaches have different advantages and limitations; however, we decided to choose one approach, and not develop and analyze several independent models in parallel in order to not artificially inflate the False Discovery Rate (FDR). We ultimately selected PLS regression because it enabled us to determine the number and composition of components required to predict age optimally from the data, which provides additional insights into the biology of human aging. But before making this selection, we compared the performance of the three approaches. The outcomes of PLS and linear regression were almost identical (R-squared between ∆Age values derived by these two methods was 0.99, meaning that if one model were to predict an individual was 62 years old, the other model would have the same prediction). This similarity is likely due to the small number of predictors (121 phenotypes) and comparatively large number of participants (over 400,000). The correlation between GBM model outcomes and PLS (and linear regression) was slightly smaller (R-squared = 0.87). The reason for the lower correlation is likely the need for imputation in PLS and linear regression models. The GBM model tolerates missing data, whereas linear regression and PLS methods require imputation or removal of individuals with too many datapoints missing, an approach we describe in more detail below.”

      Additionally, after we obtained associations of ∆Age values with genetical loci, which formed the candidate base for gene targets to influence human aging (figure 5b), we verified the top association obtained via the PLS model in Linear and GBM models. All the top candidates that we verified had statistically significant associations in all the models of ∆Age (CST3, APOE, HLA locus, CPS1, PIK3CG, IGF1). The precise strengths of the associations were different, but that is to be expected given that linear datasets had some data imputed while GBM model was built with missing values. We believe that due to small number of predictors (121) compared to a vastly larger number of individuals (over 400,000), the differences the three models introduced to final outcomes were quite small.

      To convey this message, we added to the "Discussion” section, “Broader implications of the model for physiological aging” subsection, 7th paragraph, the following text: “It is interesting to note that the three approaches we used to generate age prediction model (PLS, GBM, and linear regression) yielded very similar or identical results in performance. We chose to settle on one approach (PLS) to not artificially inflate the False Discovery Rate (FDR); however, we verified that the top genetic loci associations obtained via the PLS model were also obtained in the GBM and linear models. Specifically, the top candidates (CST3, APOE, HLA locus, CPS1, PIK3CG, IGF1) identified in the PLS approach had statistically significant associations in all the models of ∆Age. It is likely that due to the small number of predictors (121) compared to a vastly larger number of individuals (over 400,000), the differences that these models introduce to final outcomes are quite small, which increases our confidence in the results.”

      Furthermore, the authors mention "to test for overfitting, a PLS model had been generated on randomly selected 90% of individuals and tested on the remaining 10% with similar results". To comprehensively assess the model's performance, it is crucial to provide detailed results for both the test and validation datasets. This should at least include metrics such as correlation coefficients and mean squared error for both training and test datasets.

      Thank you for bringing up this point. The detailed description, details and statistics of cross-validation procedure is described in supplementary computational methods. Briefly, across 10 rounds of validation the Root Mean Square Error of Prediction (RMSEP) did not exceed 4.81 for females when all 9 PLS components were considered, and RMSEP form males was 5.1 when all 11 components were considered. The variation of RMSEP between different datasets was less than 0.1. We have now updated the text to make this validation more apparent. Specifically, we added to the "Results” section, “A mathematical model to predict age” subsection, third paragraph the following text: “Specifically, we performed 10 rounds of cross-validation, where 10% of data were held out and the remaining 90% used for training. Over 10 rounds, different 10% were held out for validation. In each case, the findings were validated in the test set. Full statistics and approach are described in supplementary computational methods.”

      (2) External validation and generalization of results

      To enhance the robustness and generalizability of the study's findings, it is crucial to perform external validation using an independent population. Specifically, conducting validation with the participants of the 'All of Us' research program offers a unique opportunity. This diverse and extensive cohort, distinct from the initial study group, will serve as an independent validation set, providing insights into the applicability of the study's conclusions across varied demographics.

      Thank you for this comment. As we mentioned above, we agree that having a replication cohort would be very valuable for this study, as well as many other studies that stem from UKBB dataset. However, yet, there is no comparable dataset to verify performance of the clock or to attempt to validate GWAS results. The closest possible is NIH-led research program “All Of Us”, which aims to collect data on 1 million people, which unfortunately is not available to for-profit companies. It is theoretically possible to rebuild a clock only using the small number of phenotypes present in both datasets with the goal of training it on one dataset and test-applying it to another, but that approach would not ultimately be informative about the accuracy of the complete physiological clock presented here. We hope academic labs will utilize our clock approach and apply it to datasets currently unavailable to us and publish their findings. For the detailed response on this issue, please see the response to the second comment of the first reviewer above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific questions/suggestions:<br /> - It looks like the ages of participants are enriched around 60 years (Fig. 1, Fig 3b). Can authors clarify whether age distribution affects the correlation tests (e.g. correlation in Fig 2)?

      Indeed, the distribution of people by age is enriched by 60–65-year-olds and is depleted at younger and older ages. Such a distribution influences the uncertainty of correlations that we compute, with error bars being larger for 40- and 70-year-olds and lower for 50- and 60-year-olds. The example of this can be seen on figure 1F. Figures 2a,b,g,h mostly deal with the correlation of phenotypes with each other and thus are not influenced by age. For other computations, such age prediction, it is theoretically possible that if age determinants among 65-year-olds differ from those for 40- or 80-year-olds, the calculated contributions would be skewed to increase accuracy in the middle of distribution at the expense of the ends. ∆Age, however, was explicitly normalized for each age cohort (Fig. 3a) to avoid “birth cohort” bias, therefore minimizing the effect of uneven distribution on further analysis, such as GWAS. We now acknowledge and describe this feature of UKBB dataset in the first paragraph of the “Results” section.

      - Phenotypic variation usually increases during aging. However, the authors showed that delta-age and age are not correlated (Figure 3a), suggesting that biological variation does not increase during aging in their analysis. Can authors provide more evidence supporting their findings? Is this phenomenon affected by their normalization method?

      Thank you for this comment. We find that there is no strict rule for phenotypic variation change with age. Certain phenotypes, such as blood pressure (Fig. 1a) or SHGB (Fig. 1d), indeed increase in variation with advanced age, however many others, such as grip strength (Fig. 1b) and BMI do not change in variation, and certain phenotypes even decrease their variation with age. As we stated above, in order to minimize the possible effect of “birth cohort” bias on subsequent analysis, as well as uneven distribution of people across ages, ∆Age was normalized per age cohort. Additionally, purifying selection likely also limits how far most physiological factors can deviate. For example, people with too high or too low blood pressures would simply perish, which would limit continuous increase in variation. 

      - Authors correlate GWAS data with delta-age (Figure 4). It would be important to show whether the delta-age from young and old participants correlates with GWAS patterns in a similar manner. If not, the authors have to consider how age differences affect delta-age and the GWAS correlation. For example, the authors mentioned that APOE genotype influences age-delta even in the 40-year-old group (Figure 4f). If the APOE genotype already shows high delta-age in the 40-year-old group, how does aging affect the delta-age distribution?

      Thank you for this comment. It is an interesting question to understand how age influences GWAS hits identified through ∆Age. At the same time, one must remember that our dataset is cross-sectional in nature and “different age” in reality is a subset of different people, which lived in different times with different exposures to environments and different standards of medical care (which are evolving over time). We specifically attempted to factor age and this “cohort effect” out of our analysis and presented Figure 4f simply as an illustration that APOE variants seem to influence human aging at any age, which challenges the theory proposed by previous studies that APOE is implicated in aging simply because APOE4 carriers likely die from Alzheimer disease and are thus excluded from the oldest cohorts. To investigate the question raised by the reviewer it is possible to do GWAS on age, however one must keep in mind the limitations associated with interpreting those results; as “age” in reality (in this cross-sectional cohort) also represents changes in population composition, changes in the environment, food quality, early life care, medical care, social habits, and other parameters associated with changing society.

      - For the discussion part, it would be great if the authors could add one section to provide guidelines for future human and lab animal studies based on observations from the current study. For example, what physiological traits are most useful, and what can be further added when collecting human data?

      Thank you for the great suggestion. We now propose and discuss certain experiments that can be performed in humans and animals to better differentiate between drivers and markers of aging.

      - In line 479, I found the statement "It is possible that synapse function accounts for the association of computer gaming with ΔAge" came from nowhere, and suggest removing it.

      Done—thank you.

      - Minor. Line 155. Is it a wrong citation of table S2c, 2d as there are only 2a and 2b?<br />

      Thank you, corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) Between lines 300-305, there is a missing reference to Figure 3e.

      Thank you, corrected.

      (2) For Figures 4a and 4c, please add the lambda statistic to the QQ plots.

      Thank you, we have added lambda inflation factors to the QQ plots.

      (3) In line 384, the p-value cut-off is mentioned as 10-9. However, this does not seem to be consistently represented in Figures 4b and 4d, where the gray lines do not align with this threshold. Please adjust these figures to accurately reflect the mentioned p-value cut-off.

      Thank you, corrected.

      (4) Clarification for Figure 5a. Add titles and correlation coefficients to Figure 5a to clearly define what the clusters represent. Please also add a discussion to explain why the cluster 10 (general health) dropout model can affect ∆Age compared to the full model, with some individuals showing a 5-year difference. Furthermore, despite the substantial effect of removing cluster 10 on ΔAge, all the top loci remain unchanged in terms of effect sizes and p-values compared to the full model.

      We have added the titles and correlation coefficients to the Figure 5a. Thank you for these suggestions, it makes the presentation of data much clearer. It is an interesting observation that whereas dropping out cluster 10 resulted in quite significant changes of ∆Age distribution, the genetic signature as determined by GWAS did not change much. The most obvious explanation is that many parameters in this category are influenced by environment more than by genetics, therefore genetic signature did not change much after the cluster removal. We now mention this observation in the text. Specifically, in the subsection “Cluster-dropout analysis enriches for GWAS hits that influence aging globally”, we added the following text: “Another interesting observation is that degree by which certain cluster contributes to the model does not necessarily correlate with how much this cluster contributes to genetic signature of human aging. For example, while dropping out cluster 10 (General Health) resulted in quite significant changes of ∆Age distribution (R<sup>2</sup>=0.88), the genetic signature as determined by GWAS did not change substantially. The most likely explanation is that many parameters in this category are influenced by environment more strongly than by genetics; for example, not as much as caused by cluster 1 (muscle-related) removal.”

      (5) Discussion on drivers and markers. Given the theoretical nature of the study, it would be beneficial to propose potential experimental validations for your findings. Even if these validations have not been performed, suggesting them would greatly enhance the value of the discussion.

      Thank you, it is a great idea. We now propose and discuss certain experiments that can be performed in humans and animals to better differentiate between drivers and markers of aging. Specifically, in the subsection “Cluster-dropout analysis enriches for GWAS hits that influence aging globally”, we added the following text: “To definitively distinguish whether a gene is a driver or a marker of aging, an experiment would need to be performed. It is possible that certain gene activities are influenced by existing FDA-approved medications, and retrospective analyses of human cohorts who take certain medications can be performed. More likely, however, an animal model would need to be employed, where animals with candidate genes modified via genetic means are investigated for lifespan and onset and progression of age-associated conditions. For example, one can engineer a mouse with a conditional allele of Cystatin-C and evaluate how changes in dosage of this protein influence various phenotypes of aging.”

    1. Author response:

      Reviewing editor comments:

      Overall, the reviewers found the imaging data to be strong but identified the physiology experiments as the weakest aspect of the study. Please consider either removing Figures 7 and 8 from the manuscript or significantly revising the data. If you choose to revise these figures, refer to the specific reviewer comments addressing them. Additionally, several reviewers noted that the prior literature was not adequately cited, so please consider addressing this concern.

      As noted below, we will work to strengthen the physiological side of the study and ensure that we are more scrupulous in citing the prior literature. Below we summarize the major concerns of each reviewer and outline our proposed response.

      Reviewer #1:

      (1) Sex differences and generalizability

      Various studies have shown sex differences in emotional responses and neural activity in mice, but to study both male and female mice would have required much larger numbers of mice than we could accommodate for practical reasons, so we chose to use only female mice to lay a solid foundation for future studies that compare (and perhaps contrast) males.

      We will:

      Make clear in the main text that we used only females.

      Cite literature on sex-specific mPFC-BLA/NAc functions in the Discussion.

      (2) Missing link between behavioral states and "emotional states"...relevant readouts such as cortisol

      We appreciate the reviewer pointing out this inadvertent conceptual slippage. We will:

      Include corticosterone measurements using an ELISA kit from archived plasma samples (collected before and after OFT/EPM tests) to correlate with behavioral and neural activity (approach refers to Panczyszyn-Trzewik et al., Steroids, 2024).

      Be more precise in our language to differentiate behavioral correlates from inferred emotional states.

      Carefully review the literature on OFT center time, EPM open-arm exploration, and tube test outcomes as anxiety/social hierarchy indicators and decide the best interpretation for our findings.

      (3) Improve methodological detail and rigor of population-level analysis

      We will:

      Expand the methods section with electrophysiology parameters (e.g., access resistance criteria, stimulus protocols).

      Add detailed histology figures (viral targeting, electrode placements) for mPFC-BLA/NAc projections.

      Include raw data points in all plots and report exact p-values, effect sizes, and group sizes (e.g., n = 12 cells from 4 mice).

      To enhance statistical rigor, we will provide clearer scatter plots with individual data points, report exact p-values, and specify group sizes in all figures.

      (4) Acute vs. sustained effects after tube test and additional controls

      We would like to clarify that we used repeated tube tests (3 times a day and continuing for 7 days) for assessing sustained rank effects. To address concerns about sustained emotional state changes post-tube test, we will:

      Assess corticosterone levels pre/post-tube test (approach refers to Panczyszyn-Trzewik et al., Steroids, 2024).

      Discuss the transient nature of hierarchy effects and cite studies using repeated tube tests for sustained rank effects.

      Reviewer #2:

      (1) Sub-region targeting in BLA/NAc

      Although different subregions within the BLA and NAc receive distinct inputs and exhibit diverse functions, comparing neuronal activity across these subregions is beyond the scope of this paper. Our primary focus is on mPFC projections, emphasizing presynaptic activity rather than postsynaptic activity within the NAc and BLA. We focused on the PL-NAc shell and PL-BLA (BA) regions because PL-to-NAc shell projections in mice are well-documented, particularly in studies utilizing viral tracers and optogenetic tools (Britt et al., Neuron, 2012; Bossert et al., J. Neurosci., 2012). These projections regulate aversive behaviors, stress responses, and motivational states and are implicated in drug-seeking behaviors and emotional valence encoding (Jocelyn & Berridge, Biol. Psychiatry, 2013; Fetcho et al., Nat. Commun., 2023; Capuzzo & Floresco, J. Neurosci., 2020; Xie et al., BioRxiv., 2025; Domingues et al., Nat Commun., 2025). The PL-BLA projection in turn sends topographically organized projections to BLA subregions, primarily targeting the basal (BA) nuclei of the BLA (McGarry & Carter, J. Neurosci., 2016; Hoover & Vertes, Brain Struct. Funct., 2007). Both the recorded NAc shell and BLA subregions are involved in emotional valence encoding.

      A detailed comparison of neuronal activity across different NAc shell and BLA subregions or comparing different cell types, such as NAc shell D1- and D2-medium spiny neurons, could each be the subject of a whole other study. Nevertheless,

      We will discuss how sub-region connectivity could contribute to observed heterogeneity in the discussion, citing relevant studies, and make sure we clarify our rationale for our experimental design.

      (2) Electrophysiological confounds

      To strengthen the rationale for our patch-clamp recordings, we will:

      Clarify in methods that recordings were performed in acute slices from behaviorally naive mice (post-tube test) to isolate synaptic changes.

      Include access resistance and cell health criteria (e.g., resting membrane potential, input resistance ranges), along with precise optogenetic stimulus protocols.

      Add example traces of mEPSCs/mIPSCs and quantify exclusion rates.

      Reviewer #3:

      (1) Specify the sexes used throughout the manuscript.

      We will make this clear throughout the paper.

      (2) Exclusion of mice lacking "center-ON" neurons

      We will:

      Explain the exclusion of mice that lacked center-ON neurons. We will also discuss the potential interpretations (e.g., floor effects in anxiety tasks) in the limitations section.

      (3) Baseline activity comparisons

      We will:

      Add baseline neuronal activity comparison between mPFC-BLA and mPFC-NAc neurons.

      (4) Stress from repeated behavioral testing

      We will:

      Clarify our experimental design to state how we tried to minimize the stress caused by multiple behavioral assays.

      Include pre-test habituation protocols in methods.

      Discuss potential cumulative stress effects in limitations.

      (5) Grooming classification

      While the reviewer is correct that grooming can be a stress-relieving behavior, it also obviously has many other functions, from the pragmatic to the social. In our study grooming occurred primarily in the periphery of the open field test, where it was exhibited as a behavior corresponding to neural activity patterns that differed from that which occurred in the center. As we classify the behavior in the center zone of the open field test as anxiety-like, we interpreted the peripheral grooming as indicative of the animal's adjustment to a novel environment, as suggested by previous work (Estanislau et al., Neurosci. Res., 2013; Rojas-Carvajal et al., Animal Behaviour, 2018). The nature of the grooming was primarily rostral body-licking, which accords with what Rojas-Carvajal et al. calls a “de-arousal inhibition system” that subserves novelty habituation. The duration and nature of this behavior are, interestingly enough, influenced by whether the mouse or rat lived in an enriched environment prior to the OFT (enriched environments made them quicker to explore a new environment but also quicker to get bored - no surprise, really).

      We did not explain any of this in the manuscript, however, so in our revision, we will make sure to discuss these nuances and cite the relevant literature.

      (6) Integrate neuronal activity and behavioral data

      We will:

      Include additional analyses quantifying neuronal activity overlap across tasks and refine our Discussion to better integrate these findings with prior literature.

      Perform cross-correlation analyses to quantify activity overlap between OFT, EPM, and SI tasks.

      Minor weaknesses

      - Clarify the cohorts of mice that were used for each behavioral assay.

      - Adjust Figure 2G scale and add insets to highlight sniffing differences.

      - Specify that M1/M2 were age-/sex-matched unfamiliar mice in the three-chamber test.

      - Detail statistical tests (e.g., mixed-effects models) and animal selection criteria in methods.

      We believe these revisions will address the reviewers’ major concerns and significantly improve the manuscript. We welcome further feedback on these plans and will provide updated figures/data for the resubmission.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors in this study extensively investigate how telomere length (TL) regulates hTERT expression via non-telomeric binding of the telomere-associated protein TRF2. They conclusively show that TRF2 binding to long telomeres results in a reduction in its binding to the hTERT promoter. In contrast, short telomeres restore TRF2 binding in the hTERT promoter, recruiting repressor complexes like PRC2, and suppressing hTERT expression. The study presents several significant findings revealing a previously unknown mechanism of hTERT regulation by TRF2 in a TL-dependent manner

      Strengths:

      (1) A previously unknown mechanism linking telomere length and hTERT regulation through the non-telomeric TRF2 protein has been established strengthening the telomere biology understanding.

      (2) The authors used both cancer cell lines and iPSCs to showcase their hypothesis and multiple parameters to validate the role of TRF2 in hTERT regulation.

      (3) Comprehensive integration of the recent literature findings and implementation in the current study.

      (4) In vivo validation of the findings.

      (5) Rigorous controls and well-designed assays have been use.

      Weaknesses:

      (1) The authors should comment on the cell proliferation and morphology of the engineered cell lines with ST or LT.

      The cell proliferation and morphology of the engineered cells were monitored during experiments. With a doubling time within 16-18 hours, all the cancer cell line pairs used in the study were counted and seeded equally before experiments.

      No significant difference in morphology or cell count (before harvesting for experiments) was noted for the stable cell lines, namely, HT1080 ST-HT1080 LT, HCT116 p53 null scrambled control-HCT116 p53 null hTERC knockdown.

      MDAMB 231 cells which were treated with guanine-rich telomere repeats (GTR) over a period of 12 days, as per the protocol mentioned in Methods. Due to the alternate day of GTR treatment in serum-free media followed by replenishment with serum-supplemented media, we noted that cells would undergo periodic delay in their proliferation (or transient arrest) aligning with the GTR oligo-feeding cycles and appeared somewhat larger in comparison to their parental untreated cells.

      Next, the cells with Cas9-telomeric sgRNA mediated telomere trimming were maintained transiently (till 3 days after transfection). During this time, no significant change in morphology or cell proliferation was observed in any of the cell lines, namely HCT116 or HEK293T Gaussia Luciferase reporter cells. iPSCs were also monitored. However, no change in morphology or cellular proliferation was observed during the 5 days post-transfection and antibiotic selection.  

      (2) Also, the entire study uses engineered cell lines, with artificially elongated or shortened telomeres that conclusively demonstrate the role of hTERT regulation by TRF2 in telomere-length dependent manner, but using ALT negative cell lines with naturally short telomere length vs those with long telomeres will give better perspective. Primary cells can also be used in this context.

      The reviewer correctly highlights (as we also acknowledge in the Discussion) that our study primarily utilizes engineered cell lines with artificially elongated or shortened telomeres. We agree that using ALT-negative cells with naturally short versus long telomeres would provide additional perspective in testing our hypothesis. However, a key challenge in this experimental setup is the inherent variation in TRF2 protein levels among these cell types—a parameter central to our hypothesis. Comparing observations across such non-isogenic cell line pairs would require extensive normalization for multiple factors and could introduce additional complexities, potentially raising more questions among scientific readers.

      We had also explored primary cells, specifically foreskin fibroblasts and MRC5 lung fibroblasts, as suggested by the reviewer. However, we encountered two significant challenges. To achieve a notable telomere length difference of at least 20%, these primary cells had to undergo a minimum of 25 passages. During this period, we observed a substantial decline in their proliferation capacity and an increased tendency toward replicative senescence. Additionally, we noted a significant reduction in TRF2 protein levels as the primary cells aged, consistent with findings from Fujita K et al., 2010 (Nat Cell Biol.), which reported p53-induced, Siah-1-mediated proteasomal degradation of TRF2. Due to these practical limitations, we focused on cancerous cell lines with an isogenic background, ensuring a controlled experimental framework. This, in turn, opens new avenues for future research to explore broader implications. Investigating other primary cell types that may not present these challenges could be a valuable direction for future studies.

      (3) The authors set up time-dependent telomere length changes by dox induction, which may differ from the gradual telomere attrition or elongation that occurs naturally during aging, disease progression, or therapy. This aspect should be explored.

      In this study, we utilized a Doxycycline-inducible hTERT expression system to modulate telomere length in cancer cells, aiming to capture any gradual changes that might occur upon steady telomerase induction or overexpression—an event frequently observed in cancer progression. We monitored telomere length and telomerase activity at regular intervals (Supplementary Figure 2), noting a gradual increase until a characteristic threshold was reached, followed by a reversal to the initial telomere length.

      While this model provides interesting insights in context of cancer cells, it does not replicate the conditions of aging or therapeutic intervention. We agree that exploring telomere length-dependent regulation of hTERT in normal aging cells is an important avenue for future research. Investigating TRF2 occupancy on the hTERT promoter in response to telomere length alterations through therapeutic interventions—such as telomestatin or imetelstat (telomerase inhibitors) and 6-thio-2’-deoxyguanosine (telomere damage inducer)—would provide valuable insights and warrants further exploration.

      (4) How does the hTERT regulation by TRF2 in a TL-dependent manner affect the ETS binding on hTERT mutant promoter sites?

      In our previous study (Sharma et al., 2021, Cell Reports), we have experimentally demonstrated that GABPA and TRF2 do not compete for binding at the mutant hTERT promoter (Figure 4M-R). Silencing GABPA in various mutant hTERT promoter cells did not increase TRF2 binding. While GABPA has been reported to show increased binding at the mutant promoter compared to the wild-type (Bell et al., 2015, Science), no telomere length (TL) sensitivity has been noted yet. This manuscript shows that telomere alterations in hTERT mutant cells do not significantly increase TRF2 occupancy at the promoter, reinforcing our earlier findings that G-quadruplex formation is crucial for TRF2 recruitment. Since TRF2 binding does not increase significantly at the mutant promoter and does not compete with GABPA, TL-sensitive TRF2 binding is unlikely to directly influence ETS binding by GABPA. Hence, increased GABPA binding to the mutant promoter as reported in the literature, remains independent of TL-sensitive TRF2 binding. However, an experimental demonstration of the above observation-based speculation would be ideal to answer the query in the future.

      (5) Stabilization of the G-quadruplex structures in ST and LT conditions along with the G4 disruption experimentation (demonstrated by the authors) will strengthen the hypothesis.

      We agree with the reviewer’s suggestion that stabilizing G-quadruplex (G4) structures in mutant promoter cells under ST and LT conditions would further strengthen our hypothesis. From our ChIP experiments on hTERT promoter mutant cells following G4 stabilization with ligands, as reported in Sharma et al. 2021 (Figure 5G), we observed that TRF2 occupancy was regained in the telomere-length unaltered versions of -124G>A and -146G>A HEK293T Gaussia luciferase cells (referred to as LT cells in the current manuscript).

      Based on these published findings, we anticipate a similar restoration of TRF2 binding in the short telomere (ST) versions, given the increased availability of TRF2 protein molecules, as proposed in our Telomere Sequestration Partitioning model.

      (6) The telomere length and the telomerase activity are not very consistent (Figure 2A, and S1A, Figure 4B and S3). Please comment.

      In this study, we employed both telomerase-dependent and independent methods for telomere elongation.

      HT1080 model: Telomere elongation resulted from constitutive overexpression of hTERC and hTERT, leading to a direct correlation with telomerase activity.

      HCT116 (p53-null) model: hTERC silencing in ST cells, a known limiting factor for telomerase activity, resulted in significantly lower telomerase activity and a 1.5-fold telomere length difference.

      MDAMB231 model: Guanine-rich telomeric repeat (GTR) feeding induced telomere elongation through recombinatorial mechanisms (Wright et al., 1996), leading to significant telomere length gain but no notable change in telomerase activity.

      HCT116 Cas9-telomeric sgRNA model: Telomere shortening occurred without modifying telomerase components, resulting in a minor, insignificant increase in telomerase activity (Figure 2A, S1).

      Regarding xenograft-derived HT1080 ST and LT cells (Figure 4B, S3), the observed variability in telomere length and telomerase activity may stem from infiltrating mouse cells, which naturally have longer telomeres and higher telomerase activity than human cells. Since in the reported assay tumour masses were not sorted to exclude mouse cells, using species-specific markers or fluorescently labelled HT1080 cells in future experiments would minimize bias. However, even though telomere length and telomerase activity assays cannot differentiate for cross-species differences, mRNA analysis and ChIP experiments performed specifically for hTERT and hTERC mRNA levels, TRF2 occupancy, and H3K27me3 enrichment on hTERT promoter (Figure 4B–E) strongly support our conclusions.

      (7) Please comment on the other telomere-associated proteins or regulatory pathways that might contribute to hTERT expression based on telomere length.

      The current study provides experimental evidence that TRF2, a well-characterized telomere-binding protein, mediates crosstalk between telomeres and the regulatory region of the hTERT gene in a telomere length-dependent manner. Given the observed link between hTERT expression and telomere length, it is likely that additional telomere-associated proteins and regulatory pathways contribute to this regulation.

      The remaining shelterin complex components—POT1, hRap1, TRF1, TIN2, and TPP1—may play crucial roles in this context, as they are integral to telomere maintenance and protection. Additionally, several DNA damage response (DDR) proteins, which interact with telomere-binding factors and help preserve telomere integrity, could potentially influence hTERT regulation in a telomere length-dependent manner. However, direct interactions or regulatory roles would require further experimental validation. Another group of proteins with potential relevance in this mechanism are the sirtuins, which directly associate with telomeres and are known to positively regulate telomere length, undergoing repression upon telomere shortening. Notably, SIRT1 has been reported to interact with telomerase (Lee SE et al., 2024, Biochem Biophys Res Commun.), while SIRT6 has been implicated in TRF2 degradation and telomerase activation. Given their roles in telomere homeostasis, sirtuins may serve as key mediators of telomere length-dependent hTERT regulation.

      Beyond protein-mediated mechanisms like the Telomere Sequestration partitioning model, telomere length-dependent regulation of hTERT may also involve chromatin architecture. The Telomere Position Effect—Over Long Distances (TPE-OLD), a phenomenon whereby telomere conformation influences gene expression at distant loci, has been reviewed extensively (Kim et al., 2018, Differentiation).

      Reviewer #2 (Public review):

      Summary:

      Telomeres are key genomic structures linked to everything from aging to cancer. These key structures at the end of chromosomes protect them from degradation during replication and rely on a complex made up of human telomerase RNA gene (hTERC) and human telomerase reverse transcriptase (hTERT). While hTERC is expressed in all cells, the amount of hTERT is tightly controlled. The main hypothesis being tested is whether telomere length itself could regulate the hTERT enzyme. The authors conducted several experiments with different methods to alter telomere length and measured the binding of key regulatory proteins to this gene. It was generally observed that the shortening of telomere length leads to the recruitment of factors that reduce hTERT expression and lengthening of telomeres has the opposite effect. To rule out direct chromatin looping between telomeres and hTERT as driving this effect artificial constructs were designed and inserted a significant distance away and similar results were obtained.

      Overall, the claims of telomere length-dependent regulation of hTERT are supported throughout the manuscript.

      Strengths:

      The paper has several important strengths. Firstly, it uses several methods and cell lines that consistently demonstrate the same directionality of the findings. Secondly, it builds on established findings in the field but still demonstrates how this mechanism is separate from that which has been observed. Specifically, designing and implementing luciferase assays in the CCR5 locus supports that direct chromatin looping isn't necessary to drive this effect with TRF2 binding. Another strength of this paper is that it has been built on a variety of other studies that have established principles such as G4-DNA in the hTERT locus and TRF2 binding to these G4 sites.

      Weaknesses:

      The largest technical weakness of the paper is that minimal replicates are used for each experiment. I understand that these kinds of experiments are quite costly, and many of the effects are quite large, however, experiments such as the flow cytometry or the IPSC telomere length and activity assays appear to be based on a single sample, and several are based upon two maximum three biological replicates. If samples were added the main effects would likely hold, and many of the assays using GAPDH as a control would result in significant differences between the groups. This unnecessarily weakens the strength of the claims.

      We appreciate the reviewer’s recognition of the resource-intensive nature of our experiments, and we are confident in the robustness of the observed results. Due to the project’s timeline constraints and the need for consistency across experiments, we have reported findings based on 3 biological replicates with appropriate statistical analysis.

      Regarding the fibroblast-iPSC model, we would like to clarify that we have presented data from two independent biological replicates, each consisting of a fibroblast and its derived iPS cell pair, rather than a single sample. Additionally, the Tel-FACS assay involved analyzing at least 10,000 events, ensuring statistical significance in all cases. Alongside this, we also conducted qRT-PCR-based telomere length determination assays. While both assays were performed, we chose to report the more sensitive Tel-FACS data in the manuscript to provide a clearer representation of the results.

      Another detail that weakens the confidence in the claims is that throughout the manuscript there are several examples of the control group with zero variance between any of the samples: e.g. Figure 2K, Figure 3N, and Figure 6G. It is my understanding that a delta delta method has been used for calculation (though no exact formula is reported and would assist in understanding). If this is the case, then an average of the control group would be used to calculate that fold change and variance would exist in the group. The only way I could understand those control group samples always set to 1 is if a tube of cells was divided into conditions and therefore normalized to the control group in each case. A clearer description in the figure legend and methods would be required if this is what was done and repeated measures ANOVA and other statistics should accompany this.

      We thank the reviewer for their valuable feedback. In response to the comment about the control group and error calculation, we would like to clarify our approach. In our previous analysis, we set the control group (Day 0) as 1 to calculate the fold change and did not include error bars, as there was no variation in the control group (since all values were normalized to 1). However, as per the reviewer’s suggestion, we will now include error bars on the Day 0 control group. These error bars will be calculated based on the standard deviation (SD) of the Ct values across the biological replicates for the control group. For the Day 10 and Day 24 time points, we retain the error bars that reflect the variance in fold change across replicates, as originally reported.

      This adjustment would allow for a clearer representation of the data and variance in the control group. We believe this addresses the reviewer’s concerns about the error calculation, and we shall update the figure legend and methods to reflect these changes. Statistical analysis, including ANOVA, was already applied as indicated in the figure.

      A final technical weakness of the paper is the data in Figure 5 where the modified hTERT promoter was inserted upstream of the luciferase gene. Specifically, it is unclear why data was not directly compared between the constructs that could and could not form G4s to make this point. For this reason, the large variance in several samples, and minimal biological replicates, this data was the least convincing in the manuscript (though other papers from this laboratory and others support the claim, it is not convincing standalone data).

      We appreciate the reviewer's thoughtful feedback on the presentation of the luciferase assay data in Figure 5. The data for the wild-type hTERT promoter (capable of forming G4 structures) was previously reported in Figure 2G-K. To avoid redundancy in data presentation, we initially chose to report the results of the mutated promoter separately. However, we recognize that directly comparing the wild-type and mutated promoter constructs within the same figure would provide clearer context and strengthen the interpretation of the results. In light of this, we will revise Figure 5 in the updated manuscript to include the data for both constructs, ensuring a more comprehensive and informative comparison.

      The second largest weakness of the paper is formatting.

      When I initially read the paper without a careful reading of the methods, I thought that the authors did not have appropriate controls meaning that if a method is applied to lengthen, there should be one that is not lengthened, and when a method is applied to shorten, one which is not shortened should be analysed as well. In fact, this is what the authors have done with isogenic controls. However, by describing all samples as either telomere short or telomere long, while this simplifies the writing and the colour scheme, it makes it less clear that each experiment is performed relative to an unmodified. I would suggest putting the isogenic control in one colour, the artificially shortened in another, and the artificially lengthened in another.

      Similarly, the graphs, in general, should be consistent with labelling. Figure 2 was the most confusing. I would suggest one dotted line with cell lines above it, and then the method of either elongation or shortening below it. I.e. HT1080 above, hTERC overexpression below, MDAMB-231 above guanine terminal repeats below, like was done on the right. Figure 2 readability would also be improved by putting hTERT promoter GAPDH (-ve control) under each graph that uses this (Panel B and Panel C not just Panel C). All information is contained in the manuscript but one must currently flip between figure legends, methods, and figures to understand what was done and this reduces clarity for the reader.

      We sincerely thank the reviewer for their constructive feedback on the formatting and clarity of the figures. We appreciate the time and effort taken to suggest ways to enhance the visual presentation and readability of the manuscript. We agree that clearer differentiation of the experimental groups would help avoid confusion, and we will consider ways to improve the visual organization, as much as possible. Additionally, we will work on restructuring the graphs for greater consistency in labeling and alignment, especially in Figure 2, to improve readability and reduce the need for cross-referencing between the figures, figure legends, and methods section. We will also ensure the hTERT promoter GAPDH (-ve control) label appears under all relevant graphs for consistency. We will make revisions to the figures in line with these suggestions to improve the overall clarity and flow of the manuscript, as much as possible.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study provides valuable and comprehensive information about the SARS-CoV-2 seroprevalence during 2021 and 2022 in different regions of Bolivia. Moreover, data on immune responses against the SARS-CoV-2 variants based on neutralization tests denotes the presence of several virus variants circulating in the Bolivian population. Evidence for seroprevalence data provided by the authors is solid, across the study period, while data regarding variant circulation is limited to the early stages of the pandemic.

      Strengths:

      The major strength of this study is that it provided nationwide seroprevalence estimates from infection and/or vaccination based on antibodies against both spike and the nucleocapsid protein in a large representative sample of sera collected at two time-points from all departments of Bolivia, gaining insight into COVID-19 epidemiology. On the other hand, data from virus neutralization assays inferred the circulation during the study period of four SARS-CoV-2 variants in the population. Overall, the study results provide an overview of the level of viral transmission and vaccination and insights into the spread across the country of SARS-CoV-2 variants.

      Weaknesses:

      The assessment of a Lambda variant that circulated in several neighboring countries (Peru, Chile, and Argentina), which had a significant impact on the COVID-19 pandemic in the region, may have strengthened the study to contrast Gamma spread. In addition, even though neutralizing antibodies can certainly reveal previous infections of SARSCOV2 variants in the population, it is of limited value to infer from this information some potential timing estimates of specific variant circulation, considering the heterogeneous effects that past infections, vaccinations, or a combination of both could have on the level of variant-specific neutralizing antibodies and/or their cross-neutralization capacity.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The conclusions of this paper are well supported by data, particularly regarding seroprevalence that reliably reflects the epidemiology of COVID-19 in Bolivia, and seroprevalence trends in other low- and middle-income countries.

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:

      Since this is the first study that has been conducted to assess indicators of immunity against SARSCoV-2 in the population of Bolivia at a nationwide scale, seroprevalence data provided by geographic regions at two time-points can be useful as a reference for potential retrospective global metaanalysis and further explore and compare the risk factors for infection, variant distribution, and the impact on infection and vaccination, gaining deeper insights into understanding the evolution of the COVID-19 pandemic in Bolivia and in the region.

      Reviewer #2 (Public Review):

      Significance of the findings:

      In this study, blood donors were assessed using serology and viral neutralization assays to determine the prevalence of SARS-CoV-2 antibodies. S1 and NCP antibodies were used to distinguish between vaccination and natural infection and virus-specific neut titers were used to determine which variants the antibodies respond to. The study reports almost universal antibody prevalence and increases in antibodies against specific variants at different points corresponding to circulating variants identified phylogenetically in neighbouring countries. The authors propose this approach for settings like Bolivia where genetic sequencing is not readily available. Unfortunately, there are significant limitations to this approach that limit its utility - serological data are available after the fact in a fast-moving pandemic and so are a poor alternative to phylogenetic data. Rather, serological information can supplement phylogenetic data and is most useful in estimating population-level immunity.

      (1) Considerations in interpreting the results:

      We appreciate the reviewer's valuable feedback, which will certainly enhance the quality of our manuscript. As a result, we have revised the text to address their suggestions as thoroughly as possible.

      a. Serology provides different information to phylogenetic sequencing of the viruses and so both are important. Viral sequencing provides real-time information on circulating variants and indicates the proportion of each variant in circulation at any point as there are almost always multiple variants spreading but it is the fastest spreading variant that comes to dominate. Importantly serology measures asymptomatic infections as well, providing population estimates of infection that are not available through viral gene sequencing.

      We underscored this point in the introduction by incorporating the following sentences:

      “Seroprevalence studies are a valuable adjunct to active surveillance because they allow analysis of the level of immunity of a population to a specific pathogen without the need for prospective testing, and also provide information on the frequency of cases that do not attract medical attention (asymptomatic infections)(4).” and “To date, the circulation of SARS-CoV-2 variants has mainly been studied through molecular surveillance, giving the proportion of circulating variants in real time. Therefore, genomic surveillance and serology offer distinct yet complementary insights thus far.”

      b. A major concern in the interpretation of serology is that antibody titers vary markedly over time with rapid declines in the first year post-infection or post-vaccination. However, these declines vary depending on whether hybrid immunity is present. Disentangling this retrospectively is a challenge. A low antibody titer could reflect an infection that occurred a few months ago but may be below the threshold for positivity at the time of testing. There is also substantial individual variability in antibody responses.

      This limitation merits emphasis and has consequently been elaborated upon in the discussion section:

      “Secondly, our results are based on serological data and may not be strictly identical to the genomic data from a quantitative point of view, although they are likely to reflect similar trends and distributions (see below). The results could also be influenced by various factors, including significant individual variation in antibody responses, as well as the decline in antibody titers during the first months following infection or vaccination(31-34) and could therefore sligly underestimated. As the complexity of SARS-CoV-2 antigen exposure histories increased among tested individuals, we observed a tendency for serological data to start diverging from genomic data. This suggests, as expected, that the effectiveness of this method would be greater if implemented early in an epidemic when the occurrence of multiple infections with different variants or the administration of varying doses of vaccine in the analyzed population before or after infection (resulting in hybrid immunity) is still limited. However, to mitigate the potential challenges arising from complex antigen exposure, we employed straightforward criteria to identify the variant among the four tested in VNT that exhibited the highest value (cf methods), thereby likely indicating the main or most recent infection and minimizing the influence of crossneutralization on the final outcomes. In addition, several approaches were used to analyze the results, including quantification of circulating antigenic groups and individual variants, yielding results that were comparable and closely aligned with the genomic data.”

      c. Serology becomes increasingly difficult to untangle when an individual has had doses of vaccine and multiple natural infections with different variants. Due to the importance of hybrid immunity in population risk to new variants, it would be useful for estimates of hybrid immunity to be generated based on anti-S1 and anti-NCP antibodies. From a population immunity perspective, this could be important in guiding future protection and boosting strategies.

      We estimated the hybrid immunity for each department in 2021 and 2022 based on the prevalence of anti-S1 and anti-NCP antibodies and added a new Supplementary Table 1. We also added a description of this table in the result section: “The estimated hybrid immunity, based on the prevalence of anti-S1 and anti-NCP antibodies, ranged from 51.4% in Pando to 73.6% in Potosí in 2021. By 2022, this increased to between 83.3% in Santa Cruz and 90.6% in Tarija (Supplementary Table 1).”

      d. Since there is cross-neutralization by the antibodies stimulated by each variant, it is important to establish the sensitivity and specificity of each of the neutralization assays in a panel comprising multiple variants. An assessment of the accuracy of the neut assay for each variant is needed to be confident that it is able to distinguish between variants.

      Assessing the performance of a the VNT for each SARS-CoV-2 variants is a highly complex task. This evaluation requires samples with comprehensive data on vaccination and infection specific to each variant to determine the specificity of each VNT for each variant. However, the access to such samples for every newly emerging variant remains challenging. In order to circumvent this issue, we evaluated the circulation level of γ, δ, and ο variants under increasingly stringent conditions, by calculating the proportion of the population with log2-ratio values of ≤0 (variant titer equal to or greater than D614G), ≤-1 (variant titer at least twice that of D614G), and ≤-2 (variant titer at least four times that of D614G).

      e. Blood donors are notoriously poor representations of the general population in many countries, driven partly by whether donation is financially rewarded. For example, in the USA, drug addicts are disproportionately over-represented in blood donor populations as they use it as a source of money. The authors provide no information on whether the blood donor population in Bolivia is representative of the entire population. Comparison of the prevalence of specific disease markers in the general population and in blood donors could provide a signal of their comparability.

      This is a significant aspect addressed in point 3.

      (2) Please provide the sensitivity and specificity of each of the assays so that the reader can assess the degree of accuracy in the assay that claims that the prevalent antibodies are due to, for example, omicron.

      The sensitivity and specificity of the in vitro assays are now referenced in a previous study: “The sensitivity and specificity of the in vitro assays were described previously(23).”

      Neutralization assays are considered the gold standard for measuring neutralizing antibodies against SARS-CoV-2 and its variants, and they are widely used in seroprevalence studies. However, until now, no one has successfully evaluated the specificity and sensitivity of this assay for SARS-CoV-2 variants, as it requires sera from individuals exposed to a single variant, which are increasingly difficult to collect for each newly emerging variants. Nevertheless, using sera from laboratory-infected animals (primarily hamsters) with a single variant exposure has enabled the antigenic characterization of SARS-CoV-2 variants through viral neutralization. This approach has shown that it is possible to distinguish between sera from individuals infected with different variants, even among the Omicron subvariants (Anna Z. Mykytyn et al. Antigenic cartography of SARS-CoV-2 reveals that Omicron BA.1 and BA.2 are antigenically distinct.Sci. Immunol.7,eabq4450(2022); Samuel H. Wilks et al. Mapping SARS-CoV-2 antigenic relationships and serological responses.Science382,eadj0070(2023)).

      (3) Please provide an assessment of the representativity of the blood donor population eg. Is the prevalence of hepatitis B serological markers in the blood donor population comparable with the prevalence of hepatitis B serological markers in the general population from community-based studies?

      A new sentence was included in the discussion to offer support for considering the blood donor population as a representative sample of the general population: “In addition, in Bolivia, blood donation is unrewarded, and blood donors appear to be quite representative of the general population. Indeed, routine screening for several infection markers (such as HIV or HBV) is conducted in all donors, and the prevalences of these markers do not differ from those observed in the general population. For example, UNAIDS data highlights a 0.4% HIV prevalence within the Bolivian general population, with significantly higher rates exceeding 25% observed in high-risk groups such as men who have sex with men(29). Moreover, Sheena et al. estimated a 0.6% prevalence of HBsAg in Bolivia in 2019(30). Bolivian national statistics of National Blood Program of the Ministry of Health and Sports, indicate that between 2019 and 2023, the proportion of HIV- and HBV-reactive units among screened blood donors ranged from 0.26% to 0.41% and 0.16% to 0.25%, respectively (Dr. Lissete Bautista’s personal communication).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates the significant role of secretory leukocyte protease inhibitor (SLPI) in regulating B. burgdorferi-induced periarticular inflammation in mice. They found that SLPI-deficient mice showed significantly higher B. burgdorferi infection burden in ankle joints compared to wild-type controls. This increased infection was accompanied by infiltration of neutrophils and macrophages in periarticular tissues, suggesting SLPI's role in immune regulation. The authors strengthened their findings by demonstrating a direct interaction between SLPI and B. burgdorferi through BASEHIT library screening and FACS analysis. Further investigation of SLPI as a target could lead to valuable clinical applications.

      The conclusions of this paper are mostly well supported by data, but two aspects need attention:

      (1) Cytokine Analysis:

      The serum cytokine/chemokine profile analysis appears without TNF-alpha data. Given TNF-alpha's established role in inflammatory responses, comparing its levels between wild-type and infected B. burgdorferi conditions would provide valuable insight into the inflammatory mechanism.

      (2) Sample Size Concerns:

      While the authors note limitations in obtaining Lyme disease patient samples, the control group is notably smaller than the patient group. This imbalance should either be addressed by including additional healthy controls or explicitly justified in the methodology section.

      We thank the reviewer for the careful review and positive comments.

      (1) We did look into the level of TNF-alpha in both WT and SLPI-/- mice with and without B. burgdorferi infection. At serum level, using ELISA, we did not observe any significant difference between all four groups. At gene expression level, using RT-qPCR on the tibiotarsal tissue, we also did not observe any significant differences. Our RT-qPCR result is consistent with the previous microarray study using the whole murine joint tissue (DOI: 10.4049/jimmunol.177.11.7930). The microarray study did not show significant changes in TNF-alpha level in C57BL/6 mice following B. burgdorferi infection. A brief discussion has been added, and the above data is provided as Supplemental figure 4 in the revised manuscript, line 334-339, and 756-763.

      (2) We agree with the reviewer that the control group is smaller than the patient group. Among the archived samples that are available, the number of adult healthy controls are limited. It has been shown that the serum level of SLPI in healthy volunteers is in average about 40 ng/ml  (DOI: 10.3389/fimmu.2019.00664 and 10.1097/00003246-200005000-00003). The median level in the healthy control in our data was 38.92 ng/ml, which is comparable to the previous results. A brief discussion has been added in the revised manuscript, line 364-369.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Yu and coworkers investigates the potential role of Secretory leukocyte protease inhibitor (SLPI) in Lyme arthritis. They show that, after needle inoculation of the Lyme disease (LD) agent, B. burgdorferi, compared to wild type mice, a SLPI-deficient mouse suffers elevated bacterial burden, joint swelling and inflammation, pro-inflammatory cytokines in the joint, and levels of serum neutrophil elastase (NE). They suggest that SLPI levels of Lyme disease patients are diminished relative to healthy controls. Finally, they find that SLPI may interact directly the B. burgdorferi.

      Strengths:

      Many of these observations are interesting and the use of SLPI-deficient mice is useful (and has not previously been done).

      We appreciate the reviewer’s careful reading and positive comments.

      Weaknesses:

      (a) The known role of SLPI in dampening inflammation and inflammatory damage by inhibition of NE makes the enhanced inflammation in the joint of B. burgdorferi-infected mice a predicted result;

      We agree that the observation of the elevated NE level and the enhanced inflammation is theoretically likely. Indeed, that was the hypothesis that we explored, and often what is theoretically possible does not turn out to occur. In addition, despite the known contribution of neutrophils to the severity of murine Lyme arthritis, the importance of the neutrophil serine proteases and anti-protease has not been specifically studied, and neutrophils secrete many factors. Therefore, our data fill an important gap in the knowledge of murine Lyme arthritis development – and set the stage for the further exploration of this hypothesis in the genesis of human Lyme arthritis.

      (b) The potential contribution of the greater bacterial burden to the enhanced inflammation is not addressed;

      We agree with the reviewer’s viewpoint that the increased infection burden in the tibiotarsal tissue of the infected SLPI-/- mice could contribute to the enhanced inflammation. A brief discussion of this possibility has been added in the revised manuscript, line 287-288.

      (c) The relationship of SLPI binding by B. burgdorferi to the enhanced disease of SLPI-deficient mice is not clear; and

      We agree with the reviewer that we have not shown the importance of the SLPI-B. burgdorferi binding in the development of periarticular inflammation. It is an ongoing project in our lab to identify the SLPI binding partner in B. burgdorferi. Our hypothesis is that SLPI could bind and inhibit an unknown B. burgdorferi virulence factor that contributes to murine Lyme arthritis. A brief discussion has been added in the revised manuscript, line 401-407.

      (d) Several methodological aspects of the study are unclear.

      We appreciate the critique. We have modified the methods section in greater detail in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the role of secretory leukocyte protease inhibitors (SLPI) in developing Lyme disease in mice infected with Borrelia burgdorferi. Using a combination of histological, gene expression, and flow cytometry analyses, they demonstrated significantly higher bacterial burden and elevated neutrophil and macrophage infiltration in SLPI-deficient mouse ankle joints. Furthermore, they also showed direct interaction of SLPI with B. burgdorferi, which likely depletes the local environment of SLPI and causes excessive protease activity. These results overall suggest ankle tissue inflammation in B. burgdorferi-infected mice is driven by unchecked protease activity.

      Strengths:

      Utilizing a comprehensive suite of techniques, this is the first study showing the importance of anti-protease-protease balance in the development of periarticular joint inflammation in Lyme disease.

      We greatly appreciate the reviewer’s careful reading and positive comments.

      Weaknesses:

      Due to the limited sample availability, the authors investigated the serum level of SLPI in both in Lyme arthritis patients and patients with earlier disease manifestations.

      We agree with the reviewer that it would be ideal to have more samples from Lyme arthritis patients. However, among the available archived samples, samples from Lyme arthritis patients are limited. For the samples from patients with single EM, the symptom persisted into 3-4 month after diagnosis, the same timeframe when acute arthritis is developed. A brief discussion has been added in the revised manuscript, line 364-369.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 2, for histological scoring, do they have similar n numbers?

      In panel B, 20 infected WT mice and 19 infected SLPI-/- mice were examined. In panel D, 13 infected WT and SLPI-/- mice were examined. Without infection, WT and SLPI-/- mice do not develop spontaneous arthritis. Due to the slow breeding of the SLPI-/- mice, a small number of uninfected control animals were used. All the supporting data values are provided in the supplemental excel.

      (2) In Figure 3, for macrophage population analysis, maybe consider implementing Ly6G-negative gating strategy to prevent neutrophil contamination in macrophage population?

      We appreciate reviewer’s suggestion. We have analyzed the data using the Ly6G-negative gating strategy and provided the result in the Supplemental figure 1. The two gating strategies showed consistent result, significantly higher percentage of infiltrating macrophages in the tibiotarsal tissue from infected SLPI-/- mice, line 154-158, line 726-729.

      Reviewer #2 (Recommendations for the authors):

      (1) The investigators should address the possibility that much of the enhanced inflammatory features of infected SLPI-deficient mice are simply due to the higher bacterial load in the joint.

      We agree with the reviewer’s viewpoint that the increased infection burden in the tibiotarsal tissue of the infected SLPI-/- mice could contribute to the enhanced inflammation. A brief discussion of this possibility has been added in the revised manuscript, line 287-288.

      (2) Fig. 1. (A) There is no statistically significant difference in the bacterial load in the heart or skin, in contrast to the tibiotarsal joint. It would be of interest to know whether other tissues that are routinely sampled to assess the bacterial load, such as injection site, knee, and bladder, also harbored increased bacterial load in SLPI-deficient mice. (B) Heart and joint burden were measured at "21-28" days. The two time points should be analyzed separately rather than pooled.

      (A) We appreciate the reviewer’s suggestion. We agree that looking into the infection load in other tissues is helpful. However, studies into murine Lyme arthritis have been predominantly focused on tibiotarsal tissue, which displays the most consistent and prominent swelling that’s easy to observe and measure. Thus, we focused on the tibiotarsal joint in our study. (B) We collected the heart and joint tissue approximately 3-week post infection within a 3-day window based on the feasibility and logistics of the laboratory. Using “21-28 d”, we meant to describe between 21 to 24 days post infection. We apologize for the mislabeling and it has been corrected it in the revised manuscript. In the methods, we defined the timeframe as “Mice were euthanized approximately 3-week post infection within a 3-day window (between 21 to 24 dpi) based on the feasibility and logistics of the laboratory”, line 464-466. In the results and figure legend, we corrected it as “between 21 to 24 dpi”.

      (3) Fig. 2. (A) The same ambiguity as to the days post-infection as cited above in Point 2B exists in this figure. (B) Panel B: Caliper measurements to assess joint swelling should be utilized rather than visual scoring. (In addition, the legend should make clear that the black circles represent mock-infected mice.)

      (A) The histology scoring, and histopathology examination were performed at the same time as heart and joint tissue collection, approximately 3 weeks post infection within a 3-day window based on the feasibility and logistics of the laboratory. We apologize for the mislabeling and it has been corrected in the revised manuscript. (B) We appreciate the reviewer’s suggestion. However, our extensive experience is that caliper measurement can alter the assessment of swelling by placing pressure on the joints and did not produce consistent results. Double blinded scoring was thus performed. Histopathology examination was performed by an independent pathologist and confirmed the histology score and provided additional measurements.

      (4) Fig. 3. (A) See Point 2B. (B) For Panels C-E, uninfected controls are lacking.

      We apologize for this omission. Uninfected controls have been provided in Figure 3 in the revised manuscript.

      (5) Fig. 4. Fig. 4. Some LD subjects were sampled multiple times (5 samples from 3 subjects with Lyme arthritis; 13 samples from 4 subjects with EM), and samples from same individuals apparently are treated as biological replicates in the statistical analysis. In contrast, the 5 healthy controls were each sampled only once.

      We agree with the reviewer that the control group is smaller than the patient group. Among the archived samples that are available, the number of adult healthy controls are limited, and sampled once. We used these samples to establish the baseline level of SLPI in the serum. It has been shown that the serum level of SLPI in healthy volunteers is in average about 40 ng/ml  (DOI: 10.3389/fimmu.2019.00664 and 10.1097/00003246-200005000-00003). The median level in the healthy control in our data was 38.92 ng/ml, which is comparable to the previous results. A brief discussion has been added in the revised manuscript, line 364-369.

      (6) Fig. 5. (A) Panel A: does binding occur when intact bacteria are used? (B) Panels B, C: Were bacteria probed with PI to indicate binding likely to occur to surface? How many biological replicates were performed for each panel? Is "antibody control" a no SLPI control? What is the blue line?

      Actively growing B. burgdorferi were collected and used for binding assays. We do not permeabilize the bacteria for flow cytometry. Thus, all the binding detected occurs to the bacterial surface. Three biological replicates were performed for each panel. The antibody control is no SLPI control. For panel D, the bacteria were stained with Hoechst, which shows the morphology of bacteria. We apologize for the missing information. A complete and detailed description of Figure 5 has been provided in both methods and figure legend in the revised manuscript. 

      (7) Sup Fig. 1. (A) Panel A: Was this experiment performed multiple times? I.e., how many biological replicates? (B) Panel B: Strain should be specified.

      The binding assay to B. burgdorferi B31A was performed two times. In panel B, B. burgdorferi B31A3 was used. We apologize for the missing information. A complete and detailed description has been provided in the figure legend in the revised manuscript. 

      (8) Fig. S2. It is not clear that the condition (20% serum) has any bactericidal activity, so the potential protective activity of SLPI cannot be determined. (Typical serum killing assays in the absence of specific antibody utilized 40% serum.)

      In Fig. S2, panel B, the first two bars (without SLPI, with 20% WT anti serum) showed around 40% viability. It indicates that the 20% WT anti serum has bactericidal activity. Serum was collected from B. burgdorferi-infected WT mice at 21 dpi, which should contain polyclonal antibody against B. burgdorferi.

      Reviewer #3 (Recommendations for the authors):

      It was a pleasure to review! I congratulate the authors on this elegant study. I think the manuscript is very well-written and clearly conveys the research outcomes. I only have minor suggestions to improve the readability of the text.

      We greatly appreciate the reviewer’s recognition of our work.

      Line 92: Please briefly summarize the key results of the study at the end of the introduction section.

      We appreciate the reviewer’s suggestion. A brief summary has been added in the revised manuscript, line 93-103.

      Line 108: Why is the inflammation significantly occurred only in ankle joints of SLPI-I mice? Could you please provide a brief explanation?

      The inflammation may also happen in other joints the B. burgdorferi infected SLPI-/- mice, which has not been studied. The study into murine Lyme arthritis has been predominantly done in the tibiotarsal tissue, which displays the most prominent swelling that’s easy to observe and measure. Thus, we focused on the tibiotarsal joint in our study.

      Line 136: Please also include the gene names in Figure 3.

      We apologize for the omission. Gene names has been included in figure legend in the revised manuscript.

      Line 181: Please briefly introduce BASEHIT. Why did you use this tool? What are the benefits?

      We appreciate the reviewer’s suggestion. We have provided a brief introduction on BASEHIT in the revised manuscript, line 216-218.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors address an important issue in Babesia research by repurposing cipargamin (CIP) as a potential therapeutic against selective Babesia spp. In this study, CIP demonstrated potent in vitro inhibition of B. bovis and B. gibsoni with IC<sub>50</sub> values of 20.2 ± 1.4 nM and 69.4 ± 2.2 nM, respectively, and the in vivo efficacy against Babesia spp. using mouse model. The authors identified two key resistance mutations in the BgATP4 gene (BgATP4<sup>L921I</sup> and BgATP4<sup>L921V</sup>) and explored their implications through phenotypic characterization of the parasite using cell biological experiments, complemented by in silico analysis. Overall, the findings are promising and could significantly advance Babesia treatment strategies.

      Strengths:

      In this manuscript, the authors effectively repurpose cipargamin (CIP) as a potential treatment for Babesia spp. They provide compelling in vitro and in vivo data showing strong efficacy. Key resistance mutations in the BgATP4 gene are identified and analyzed through both phenotypic and in silico methods, offering valuable insights for advancing treatment strategies.

      Thank you for your insightful comments and for taking the time to review our manuscript.

      Weaknesses:

      The manuscript explores important aspects of drug repurposing and rational drug design using cipargamin (CIP) against Babesia. However, several weaknesses should be addressed. The study lacks novelty as similar research on cipargamin has been conducted, and the experimental design could be improved. The rationale for choosing CIP over other ATP4-targeting compounds is not well-explained. Validation of mutations relies heavily on in silico predictions without sufficient experimental support. The Ion Transport Assay has limitations and would benefit from additional assays like Radiolabeled Ion Flux and Electrophysiological Assays. Also, the study lacks appropriate control drugs and detailed functional characterization. Further clarity on mutation percentages, additional safety testing, and exploration of cross-resistance would strengthen the findings.

      We appreciate your feedback and for giving us the chance to improve our paper. We have specified how we revised the below comments one by one. I hope these address your concerns.

      Comment 1: It is commendable to explore drug repurposing, drug deprescribing, drug repositioning, and rational drug design, especially using established ATP4 inhibitors that are well-studied in Plasmodium and other protozoan parasites. While the study provides some interesting findings, it appears to lack novelty, as similar investigations of cipargamin on other protozoan parasites have been conducted. The study does not introduce new concepts, and the experimental design could benefit from refinement to strengthen the results. Additionally, the rationale for choosing CIP over other MMV compounds targeting ATP4 is not clearly articulated. Clarifying the specific advantages CIP may offer against Babesia would be beneficial. Finally, the validation of the identified mutations might be strengthened by additional experimental support, as reliance on in silico predictions alone may not fully address the functional impact, particularly given the potential ambiguity of the mutations (BgATP4 L to V and I).

      Thank you for your thoughtful feedback. We have addressed the concerns as follows: (1) Introduction of new concepts and experimental design: While our study primarily builds on existing frameworks, it provides novel insights into the interaction of CIP with Babesia parasites, which we believe contribute to the field. Regarding the experimental design, we acknowledge its limitations and have revised the manuscript to include additional experiments to strengthen the robustness of our findings. Specifically, we have added experiments on the detection of BgATP4-associated ATPase activity (Figure 3H), the evaluation of cross-resistance to antibabesial agents (Figures 5A and 5B), and the efficacy of CIP plus TQ combination in eliminating B. microti infection with no recrudescence in SCID mice (Figure 5C).

      (2) Rationale for choosing CIP over other MMV compounds targeting ATP4: We appreciate this point and have expanded the introduction section to articulate our rationale for selecting CIP (Lines 94-97). Specifically, CIP was chosen due to its previously demonstrated efficacy against Plasmodium and other protozoan parasites.

      (3) Validation of identified mutations: We agree that additional experimental data would strengthen the validation of the identified mutations. In response, we have indicated the ratio of wild-type to mutant parasites by Illumina NovaSeq6000 to validate the impact of the BgATP4 C-to-G and A mutations (Figure 2D).

      Comment 2: Conducting an Ion Transport Assay is useful but has limitations. Non-specific binding or transport by other cellular components can lead to inaccurate results, causing false positives or negatives and making data interpretation difficult. Indirect measurements, like changes in fluorescence or electrical potential, can introduce artifacts. To improve accuracy, consider additional assays such as

      a. Radiolabeled Ion Flux Assay: tracks the movement of Na<sup>+</sup> using radiolabeled ions, providing direct evidence of ion transport.

      b. Electrophysiological Assay: measures ionic currents in real-time with patch-clamp techniques, offering detailed information about ATP4 activity.

      Thank you for highlighting the limitations of the ion transport assay and suggesting alternative approaches to improve accuracy. However, they require specialized equipment and expertise not currently available in our laboratory. We have acknowledged these limitations and included these alternative methods as part of the study's future directions. Thank you for your suggestions which will undoubtedly enhance the rigor and depth of our research.

      Comment 3: In-silico predictions can provide plausible outcomes, but it is essential to evaluate how the recombinant purified protein and ligand interact and function at physiological levels. This aspect is currently missing and should be included. For example, incorporating immunoprecipitation and ATPase activity assays with both wild-type and mutant proteins, as well as detailed kinetic studies with Cipargamin, would be recommended to validate the findings of the study.

      Thank you for your insightful suggestions regarding the validation of in-silico predictions. We recognize the importance of evaluating the interaction and function of recombinant purified proteins and ligands at physiological levels to strengthen the study's findings. (1) Incorporating experimental validation:

      a. Immunoprecipitation assays: We agree that immunoprecipitation could provide valuable evidence of protein-ligand interactions. While this was not included in the current study due to limitations in sample availability, we plan to incorporate this assay in follow-up experiments.

      b. ATPase activity assays: Assessing ATPase activity in both wild-type and mutant proteins is a crucial step in validating the functional impact of the identified mutations. We included the results in the revised manuscript (Figure 3H).

      (2) Detailed kinetic studies with cipargamin: We appreciate the recommendation to conduct detailed kinetic analyses. These studies would provide deeper insights into the binding affinity and inhibition dynamics of cipargamin. We have included the results of these experiments in the current study (Figure 3I).

      Comment 4: The study lacks specific suitable control drugs tested both in vitro and in vivo. For accurate drug assessment, especially when evaluating drugs based on a specific phenotype, such as enlarged parasites, it is important to use ATP4 gene-specific inhibitors. Including similar classes of drugs, such as Aminopyrazoles, Dihydroisoquinolines, Pyrazoleamides, Pantothenamides, Imidazolopiperazines (e.g., GNF179), and Bicyclic Azetidine Compounds, would provide more comprehensive validation.

      Thank you for emphasizing the importance of including suitable control drugs. We acknowledge the absence of specific control drugs in the previous version of the manuscript. To date, no drug targeting ATP4 proteins in Babesia has been definitively identified. The suggested drugs could potentially disrupt the parasite's ability to regulate sodium levels by inhibiting PfATP4, a protein essential for its survival. This highlights PfATP4 as an attractive target for antimalarial drug development. However, further studies are required to evaluate whether these drugs exhibit similar activity against ATP4 homologs in Babesia.

      Comment 5: Functional characterization of CIP through microscopic examination and quantification for assessing parasite size enlargement is not entirely reliable. A Flow Cytometry-Based Assay is recommended instead 9 along with suitable control antiparasitic drugs). To effectively monitor Cipargamin's action, conducting time-course experiments with 6-hour intervals is advisable rather than relying solely on endpoint measurements. Additionally, for accurate assessment of parasite morphology, obtaining representative qualitative images using Scanning Electron Microscopy (SEM) or Transmission Electron Microscopy (TEM) for treated versus untreated samples is recommended for precise measurements.

      Thank you for your constructive feedback regarding the methods for functional characterization of CIP and the evaluation of parasite morphology.

      (1) Flow Cytometry-Based Assay: We agree that a flow cytometry-based assay would enhance the accuracy of detecting changes in parasite size and morphology. We will implement this method in future studies as our laboratory currently does not have the capability to conduct such experiments.

      (2) Microscopy for Morphology Assessment: We acknowledge the importance of obtaining high-resolution, representative images of treated and untreated samples. Utilizing Scanning Electron Microscopy (SEM) or Transmission Electron Microscopy (TEM) for qualitative analysis will significantly improve the precision of our morphological assessments. However, both methods have limitations.

      a. SEM: This technique can only scan the erythrocytes' surface; it cannot scan the parasite itself because it is inside the erythrocytes.

      b. TEM: Since the parasite is fixed, observations from various angles may reveal longitudinal or cross-sectional portions, making it impossible to precisely view the parasite's dimensions. As a result, we employed TEM to precisely observe the parasite's internal structure alterations both before and after treatment, as seen in Figure 3C.

      Comment 6: A notable contradiction observed is that mutant cells displayed reduced efficacy and affinity but more pronounced phenotypic effects. The BgATP4<sup>L921I</sup> mutation shows a 2x lower susceptibility (IC<sub>50</sub> of 887.9 ± 61.97 nM) and a predicted binding affinity of -6.26 kcal/mol with CIP. However, the phenotype exhibits significantly lower Na<sup>+</sup> concentration in BgATP4<sup>L921I</sup> (P = 0.0087) (Figure 3E).

      The seemingly contradicting observation of reduced CIP binding and efficacy in the BgATP4<sup>L921I</sup> mutant with a significant decrease in intracellular Na<sup>+</sup> concentration may be explained by factors other than the direct CIP interaction. Logically, we consider that CIP binds less effectively to its target in the BgATP4<sup>L921I</sup> mutant, but the observed phenotype may be attributed to the functional consequences of the mutation. The BgATP4<sup>L921I</sup> mutation probably directly impacts the function of BgATP4's ion transport mechanism, which likely disrupts Na<sup>+</sup> homeostasis independently. Thus, we hypothesize that the dysregulated Na<sup>+</sup> homeostasis is driven by the mutation itself rather than the already weakened inhibitory effect of CIP.

      Comment 7: The manuscript does not clarify the percentage of mutations, and the number of sequence iterations performed on the ATP4 gene. It is also unclear whether clonal selection was carried out on the resistant population. If mutations are not present in 100% of the resistant parasites, please indicate the ratio of wild-type to mutant parasites and represent this information in the figure, along with the chromatograms.

      Thank you for your valuable comments. We appreciate your detailed observations and giving us the opportunity to clarify these points. During the long-term culture process, subculturing was performed every three days. Although clonal selection was not conducted, mutant strains were effectively selected during this process. Using the Illumina NovaSeq6000 sequencing platform, high-throughput next-generation sequencing was performed to detect ratio of wild-type to mutant parasites. Results showed that for BgATP4<sup>L921V</sup>, 99.97% of 7,960 reads were G, and for BgATP4<sup>L921I</sup>, 99.92% of 7,862 reads were A. To enhance clarity, we have included a new figure (Figure 2D) illustrating the sequencing results. We believe this addition will help provide a clearer understanding for the readers.

      Comment 8: While the compound's toxicity data is well-established, it is advisable to include additional testing in epithelial cells and liver-specific cell lines (e.g., HeLa, HCT, HepG2) if feasible for the authors. This would provide a more comprehensive assessment of the compound's safety profile.

      Thank you for your thoughtful suggestion. We included toxicity testing in human foreskin fibroblasts (HFF) as supplemental toxicity data to provide a more comprehensive evaluation of the compound's safety profile (Figure supplement 1B).

      Comment 9: In the in vivo efficacy study, recrudescent parasites emerged after 8 days of treatment. Did these parasites harbor the same mutation in the ATP4 gene? The authors did not investigate this aspect, which is crucial for understanding the basis of recrudescence.

      Thank you for raising this important point. We acknowledge that understanding the genetic basis of recrudescence is critical for elucidating mechanisms of resistance and treatment failure. Although our current study did not include an analysis of the BrATP4 gene in relapse parasites due to limitations in sample availability, we evaluated CIP efficacy in SCID mice and performed sequencing analysis of the BmATP4 gene in recrudescent samples. However, no mutation points were identified (Lines 211-212). We believe that if a relapse occurs after the 7-day treatment, it is unlikely that the parasites would easily acquire mutations.  

      Comment 10: The authors should explain their choice of BABL/c mice for evaluating CIP efficacy, as these mice clear the infection and may not fully represent the compound's effectiveness. Investigating CIP efficacy in SCID mice would be valuable, as they provide a more reliable model and eliminate the influence of the immune system. The rationale for not using SCID mice should be clarified.

      We appreciate the reviewer's suggestion regarding the use of SCID mice to evaluate the efficacy of CIP. In response to your suggestion, we have now included an experiment using SCID mice to evaluate the efficacy of CIP and to eliminate the confounding influence of the immune system. We further investigated the potential of combined administration of CIP plus TQ to eliminate parasites, as we are concerned that the long-term use of CIP as a monotherapy may be limited due to its potential for developing resistance. The results are shown in Figure 5C.

      Comment 11: Do the in vitro-resistant parasites show any potential for cross-resistance with commonly used antiparasitic drugs? Have the authors considered this possibility, and what are their expectations regarding cross-resistance?

      Thank you for your insightful question regarding the potential for cross-resistance between in vitro-resistant parasites and commonly used antiparasitic drugs. In response to your suggestion, we have now included experiments to assess whether B. gibsoni parasites that are resistant to CIP exhibit any cross-resistance to other commonly used antiparasitic drugs, such as atovaquone (ATO) and tafenoquine (TQ). The IC<sub>50</sub> values for both ATO and TQ in the resistant strains showed only slight changes compared to the wild-type strain, with less than a onefold difference (Figure 5A, 5B). This minimal variation suggests that the resistant strain has a mild alteration in susceptibility to ATO and TQ, but not enough to indicate strong resistance or significant cross-resistance. This suggests that CIP could be used in combination with TQ to treat babesiosis.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors have tried to repurpose cipargamin (CIP), a known drug against plasmodium and toxoplasma against babesia. They proved the efficacy of CIP on babesia in the nanomolar range. In silico analyses revealed the drug resistance mechanism through a single amino acid mutation at amino acid position 921 on the ATP4 gene of Babesia. Overall, the conclusions drawn by the authors are well justified by their data. I believe this study opens up a novel therapeutic strategy against babesiosis.

      Strengths:

      The authors have carried out a comprehensive study. All the experiments performed were carried out methodically and logically.

      Thank you for the comments and your time to review our manuscript.

      Weaknesses:

      The introduction section needs to be more informative. The authors are investigating the binding of CIP to the ATP4 gene, but they did not give any information about the gene or how the ATP4 inhibitors work in general. The resolution of the figures is not good and the font size is too small to read properly. I also have several minor concerns which have been addressed in the "Recommendations for the authors" section.

      We thank the reviewer for their valuable comments. In response, we have revised the introduction to include a more detailed explanation of the ATP4 gene, its biological significance, and the mechanism of ATP4 inhibitors to provide a better context of the study (Lines 86-93). Additionally, we have reformatted the figures to enhance resolution and increased the font size to ensure improved readability. We also appreciate the reviewer's careful assessment of the manuscript and have addressed all minor concerns outlined in the "Recommendations for the Authors" section. A detailed, point-by-point response to each concern is provided in the response letter, and the corresponding revisions have been incorporated into the manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to establish that cipargamin can be used for the treatment of infection caused by Babesia organisms.

      Strengths:

      The study provides strong evidence that cipargamin is effective against various Babesia species. In vitro, growth assays were used to establish that cipargamin is effective against Babesia bovis and Babesia gibsoni. Infection of mice with Babesia microti demonstrated that cipargamin is as effective as the combination of atovaquone plus azithromycin. Cipargamin protected mice from lethal infection with Babesia rodhaini. Mutations that confer resistance to cipargamin were identified in the gene encoding ATP4, a P-type Na<sup>+</sup> ATPase that was found in other apicomplexan parasites, thereby validating ATP4 as the target of cipargamin.

      We appreciate the reviewer for taking the time to review our manuscript.

      Weaknesses:

      Cipargamin was tested in vivo at a single dose administered daily for 7 days. Despite the prospect of using cipargamin for the treatment of human babesiosis, there was no attempt to identify the lowest dose of cipagarmin that protects mice from Babesia microti infection. Exposure to cipargamin can induce resistance, indicating that cipargamin should not be used alone but in combination with other drugs. There was no attempt at testing cipargamin in combination with other drugs, particularly atovaquone, in the mouse model of Babesia microti infection. Given the difficulty in treating immunocompromised patients infected with Babesia microti, it would have been informative to test cipargamin in a mouse model of severe immunosuppression (SCID or rag-deficient mice).

      We thank the reviewer for raising these important comments. We address each concern as follows:

      (1) Identifying the lowest protective dose of CIP:

      Although our current study was designed to assess the efficacy of CIP at a single therapeutic dose over a 7-day period, we acknowledge that identifying the lowest effective dose would provide valuable information for optimizing treatment regimens. We plan to address this in future studies by conducting a dose-response experiment to identify the minimal protective dose of CIP.

      (2) Testing CIP in combination with other drugs:

      In the current study, we have tested the efficacy of tafenoquine (TQ) combined with CIP, as well as CIP or TQ administered individually, in a mouse model of B. microti infection. Our results demonstrated that, compared with monotherapy, the combination of CIP and TQ completely eliminated the parasites within 90 days of observation (Figure 5C).

      (3) Testing in an immunocompromised mouse model:

      We agree with the reviewer that evaluating CIP in immunocompromised models is critical for understanding its potential in treating immunocompromised patients. To address this, we have conducted experiments using SCID mice infected with B. microti. Our results indicated that the combination therapy of CIP plus TQ was effective in eliminating parasites in the severely immunocompromised model (Figure 5D).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment 1: Table: Include the in-silico binding energies for each mutation and ligand.

      We have added binding energies for each mutation and ligand in Table supplement 3.

      Comment 2: Did the authors investigate the potential of combination therapies involving CIP?

      We have tested the efficacy of TQ combined with CIP in a mouse model of B. microti infection.

      Comment 3: Does this mutation affect the transmission of the parasite?

      Based on our observations, the growth and generation rates of the mutant strain are comparable to those of the wild-type strain. These findings suggest that the mutation does not significantly affect the spread or transmission of the parasite. We have included this observation in the revised manuscript (Lines 243-244).

      Comment 4: 60: Use abbreviations CLN for clindamycin and QUI for quinine.

      We have revised them accordingly (Lines 59-60).

      Comment 5: 86: The hypothesis is not strong or convincing; it needs to be modified to be more specific and convincing.

      We have revised the hypothesis to reflect the rationale behind the study better and to support our claim more strongly (Lines 94-97).

      Comment 6: 93: Change to: "In vitro efficacy of CIP against B. bovis and B. gibsoni.".

      We have changed the suggested content in the manuscript (Line 104).

      Comment 7: 96: Define CC<sub>50</sub>.

      We have added the definition of CC<sub>50</sub> (Line 106).

      Comment 8: 102: Change to: "...Balb/c mice increased dramatically in the...".

      We have changed the word following your recommendation (Line 114).

      Comment 9: 108: "...significant decrease at 12 DPI...".

      We have revised it according to your suggestion (Line 120).

      Comment 10: 110: "This indicates that the administration...".

      We have revised it according to your suggestion (Line 122).

      Comment 11: Figure 1:

      (1) Panels A and B should clearly indicate parasite species within the graph for better self-explanation.

      We have indicated parasite species within the graph.

      (2) For panels C, D, and E, if mice were eliminated or euthanized in the study, include a symbol in the graph to indicate this.

      For panels C and D, no mice were eliminated during the study; therefore, no symbol was added to these graphs. Panel F already provides information about the number of eliminated mice, which corresponds to the data in Panel E.

      (3) In panels C, D, and E, use a continuation arrow for drug treatment rather than a straight line, to cover the duration of the treatment.

      We have updated the figures to use continuation arrows instead of straight lines to represent the duration of drug treatment.

      Comment 12: Figure 2: The color combination for the WT and mutant curves is hard to read; consider using regular, less fluorescent, and more distinguishable colors.

      We have adjusted the color scheme to use more distinguishable and less fluorescent colors, ensuring better readability and clarity. The revised figure with the updated color scheme has been included in the updated manuscript, and we hope this resolves the readability concern.

      Comment 13: Figure 3:

      (1) Panel A: Represent a single infected iRBC rather than a field for better visualization.

      We have updated Panel A to display a single infected iRBC instead of a field.

      (2) Panels E and F: Change the color patterns, as the current colors, especially the green variants (WT and mutant L921V), are difficult to read.

      To improve readability, we have updated the color patterns for these panels by selecting more distinguishable colors with higher contrast (Figure 3F, 3G).

      Comment 14: Figure 4: Panels B, C, and D: The text is too small to read; increase the font size or change the resolution.

      We have increased the font size and replaced the panels with high-resolution versions (Figure 4B, 4C, 4D).

      Reviewer #2 (Recommendations for the authors):

      Comment 1: In the last paragraph of the introduction, the authors mentioned determining the activity of CIP in vitro in B. bovis and B. gibsoni while in vivo in B. microti and B. rodhaini. It is not explained why they are testing the in vitro and in vivo effects on different Babesia species. Could you please add some logic there? Also, why did they mention measuring the inhibitory activity of CIP by monitoring the Na<sup>+</sup> and H<sup>+</sup> balance? This part needs to be rewritten with more information. The ATP4 gene is not properly introduced in the manuscript.

      We thank the reviewer for raising these important points. Below, we address each aspect of the comment in detail:

      (1) Rationale for testing different Babesia spp. in vitro and in vivo:

      B. bovis and B. gibsoni are well-established Babesia models for in vitro culture systems, allowing evaluation of CIP's inhibitory activity under controlled laboratory conditions. B. microti and B. rodhaini, on the other hand, are commonly used rodent models for the in vivo studies of babesiosis, enabling the assessment of drug efficacy in a mammalian host system. This multi-species approach provides a comprehensive evaluation of CIP's efficacy across Babesia spp. with different biological characteristics.

      (2) Measuring CIP's inhibitory activity via Na<sup>+</sup> and H<sup>+</sup> balance:

      We acknowledge that this section of the introduction requires more context. The revised manuscript now includes additional information explaining that the ATP4 gene, which encodes a Na<sup>+</sup>/H<sup>+</sup> transporter, is the proposed target of CIP (Lines 86-93). CIP disrupts the ion homeostasis maintained by ATP4, leading to an imbalance in Na<sup>+</sup> and H<sup>+</sup> concentrations. Monitoring these ionic changes provides a mechanistic understanding of CIP's mode of action and its impact on parasite viability. This rationale has been expanded in the introduction to clarify its significance.

      Comment 2: The figure fonts are too small. The resolution for the images is also poor.

      We have increased the font size in all figures to improve readability. Additionally, we have replaced the figures with high-resolution versions to ensure clarity and visual quality.

      Comment 3: Figures 1A and 1B: one of the error bars merged to the X-axis legend. Please modify these panels. Which curve was used to determine the IC<sub>50</sub> values (although it's mentioned in the methods section, would it be better to have the information in the figure legends as well)?

      We thank the reviewer for their comments regarding Figures 1A and 1B.

      (1) Error bars overlapping the X-axis legend:

      The error bars in the figures were automatically generated using GraphPad Prism9 based on the data and are determined by the values themselves. Unfortunately, this overlap cannot be avoided without altering the data representation.

      (2) IC<sub>50</sub> curve information:

      To clarify the determination of IC<sub>50</sub> values, we have already included gray dashed lines in the graphs to indicate where the IC<sub>50</sub> values were derived from the curves. This visual representation provides clear information about the IC<sub>50</sub> points.

      Comment 4: Supplementary Figure 1: what are MDCK cells? What is CC<sub>50</sub>? Please mention their full forms in the text and figure legends (they should be described here because the methods section comes later). What is meant by a predicted selectivity index? There should be an explanation of why and how they did it. Which curve was used to determine the IC<sub>50</sub> values?

      We thank the reviewer for pointing out the need to clarify terms and provide additional context in the supplementary figure and text. We have updated the figure legend and text to include the full forms of MDCK (Madin-Darby canine kidney) cells and CC<sub>50</sub> (50% cytotoxic concentration), ensuring clarity for readers encountering these terms for the first time. In text, now we have included a brief explanation of the selectivity index as a measure of a drug's safety and specificity (Lines 108-110). The selectivity index is calculated as the ratio between the half maximal inhibitory concentration (IC<sub>50</sub>) and the 50% cytotoxic concentration (CC<sub>50</sub>) values (Lines 333-335). We also have already included gray dashed lines in the graphs to indicate where the IC<sub>50</sub> values were derived from the curves (Figure supplement 1).

      Comment 5: Figures 1C-F: It feels unnecessary to write down n=6 for each panel and each group. Since "n" is equal for all, it would be nice to just mention it in the figure legend only.

      We appreciate the reviewer's suggestion regarding the notation of "n=6" in Figures 1C-F. To improve clarity and reduce redundancy, we have removed the "n=6" notation from the individual panels and included it in the figure legend instead.

      Comment 6: Figure 2A: was never mentioned in the text.

      We have described the sequencing results for the wild-type B. gibsoni ATP4 gene with a reference to Figure 2A in the revised manuscript (Lines 134-135).

      Comment 7: Figure 2D: some of the error bars merged to the X-axis legend. Please modify. Again, which curve was used to determine the IC<sub>50</sub> values? Can the authors explain why the pH declined after 4 minutes?

      We thank the reviewer for this insightful question.

      (1) Error bars overlapping the X-axis legend:

      The error bars in Figure 2E were automatically generated using GraphPad Prism9 and are determined by the underlying data values. Unfortunately, this overlap cannot be avoided without altering the data representation.

      (2) IC<sub>50</sub> curve information:

      Since Figure 2E contains three separate curves, adding dashed lines to indicate the IC<sub>50</sub> for each curve would make the figure overly cluttered and reduce readability. To address this, we have clearly indicated the IC<sub>50</sub> values in Figures 1A and 1B and described the methodology for determining IC<sub>50</sub> values in the Methods section. We believe this approach provides sufficient clarity without compromising the visual experience of Figure 2E.

      (3) The pH decline observed after 4 minutes (Figure 3E) may be attributed to the following factors:

      a. Ion transport dynamics:

      The initial rise in pH likely reflects the rapid inhibition of Na<sup>+</sup>/H<sup>+</sup> exchange mediated by CIP, which temporarily alkalinizes the intracellular environment. However, after this initial phase, compensatory mechanisms, such as proton influx or metabolic acid production, may lead to a subsequent decline in pH.

      b. Drug kinetics and target interaction:

      The decline could also result from the time-dependent effects of CIP on ATP4-mediated ion transport. As the drug action stabilizes, the parasite may partially restore ionic balance, leading to a decrease in intracellular pH.

      Comment 8: Supplementary Figure 2: It's difficult to distinguish between red and pink colors, so it would be wise to use two contrasting colors to distinguish between Pf and Tg CIP resistant cites.

      We have updated the figure to enhance clarity. Purple squares and arrows now represent sites linked to P. falciparum CIP resistance, replacing the previous red squares. Similarly, gray squares and arrows have replaced the green squares to denote sites associated with T. gondii (Figure supplement 2).

      Comment 9: Line 65: Is it possible to add a reference here?

      We have added a reference in line 65.

      Comment 10: Line 69: Please spell the full form of G6PD as it was never mentioned before.

      We have added the full form of G6PD in lines 69-70.

      Comment 11: Line 103: mention what DPI is (irrespective of the methods section which comes later).

      We have spelled out DPI (days postinfection) in line 115.

      Comment 12: Line 120: It's not explained why B. gibsoni ATP4 gene was investigated? There should be more explanation and references to previous work.

      We thank the reviewer for pointing out the need to provide more context for investigating the B. gibsoni ATP4 gene. To address this, we have added more information to the introduction, explaining that the ATP4 gene, which encodes a Na<sup>+</sup>/H<sup>+</sup> transporter, is the proposed target of CIP (Lines 86-93).

      Comment 13: Line 203-219: line spacing seems different from the rest of the manuscript.

      We have corrected the incorrect format (Lines 262-278).

      Reviewer #3 (Recommendations for the authors):

      Comment 1: Lines 66-68: The report by Marcos et al. 2022 did not demonstrate that tafenoquine was effective in curing relapsing babesiosis. In the discussion of that article, the authors state that "it is impossible to conclude that the drug tafenoquine provided any clinical benefit." The first demonstration of tafenoquine efficacy against relapsing babesiosis was reported by Rogers et al. 2023 and confirmed by Krause et al. 2024. Please rephrase the statement and use relevant citations.

      We thank the reviewer for pointing out this issue and we have rephrased the statement and used relevant citations (Lines 66-68).

      Comment 2: Line 103: mean parasitemia at 10 DPI is reported to be 35.88% but Figure 1C appears to indicate otherwise.

      We are sorry for the carelessness, the correct mean parasitemia at 10 DPI is 38.55%, and this has been updated in line 115 of the revised manuscript to reflect the data shown in Figure 1C.

      Comment 3: Line 116: parasitemia is said to recur on day 14 post-infection but Figure 1E indicates that recurrence was already noted on day 12 post-infection.

      We thank the reviewer for pointing out this inconsistency. We have corrected the relapse day to reflect that recurrence was noted on day 12 post-infection, as shown in Figure 1E. This correction has been made in the revised manuscript (Line 128).

      Comment 4: Line 120: Replace "wells" with "strains". Also, start the paragraph with one brief sentence to state how resistant parasites were generated.

      We have replaced "wells" with "strains" and added one brief sentence to explain how resistant parasites were generated (Lines 132-134).

      Comment 5: Line 169: is Ji et al, 2022b truly the appropriate reference to support a statement on tafenoquine?

      We thank the reviewer for highlighting this point. We have added one other reference to support a statement on tafenoquine. The IC<sub>50</sub> value of TQ was 20.0 ± 2.4 μM against B. gibsoni (Ji et al., 2022b), and 31 μM against B. bovis (Carvalho et al., 2020) (Lines 223-225).

      Comment 6: Lines 184-185: given that exposure to CIP induces mutations in the ATP4 gene and therefore resistance to CIP, what is the prospect of using CIP for the treatment of babesiosis? Can the authors speculate on whether CIP should not be used alone but rather in combination with other drugs currently used for the treatment of human babesiosis?

      We thank the reviewer for raising this important question. Given that exposure to CIP induces mutations in the ATP4 gene, leading to resistance, we acknowledge that the long-term use of CIP as a monotherapy may be limited due to the potential for resistance development. To address this concern, we investigated the combination therapy of TQ and CIP to achieve the complete elimination of B. microti in infected mice (a model for human babesiosis). The results of this study are presented in Figure 5C.

      Comment 7: Lines 258-259: it is stated that drug treatment was initiated on day 4 post-infection when mean parasitemia was 1% and that drug treatment was continued for 7 days. This is not the case for B. rodhaini infection. As reported in Figure 1E, treatment was initiated on day 2 post-infection.

      We apologize for the oversight and any confusion caused. We have corrected the statement to reflect that drug treatment for B. rodhaini-infected mice was initiated at 2 DPI, as reported in Figure 1E (Lines 347-349).

      Comment 8: Lines 282-285: RBCs are said to be exposed to CIP for 3 days but parasite size is said to be measured on day 4. Which is correct?

      We thank the reviewer for pointing out this discrepancy. To clarify, the infected erythrocytes were exposed to CIP for three consecutive days (72 hours). Blood smears were then prepared at the 73<sup>rd</sup> hour, corresponding to the fourth day.

      Comment 9: Lines 35-37: this sentence can be omitted from the abstract as it does not summarize additional insight or additional data.

      We have omitted this sentence from the abstract.

      Comment 10: Line 55: replace Drews et al. 2023 with Gray and Ogden 2021 (doi: 10.3390/pathogens10111430). This excellent article directly supports the statement made by the authors.

      We appreciate the reviewer's suggestion and have replaced the reference with Gray and Ogden, 2021 (doi: 10.3390/pathogens10111430) (Line 54).

      Comment 11: Line 55: modify the start of sentence to read "The disease is known as babesiosis ...".

      We have modified the sentence (Line 54).

      Comment 12: Line 56: rephrase to read ".... but chronic infections can be asymptomatic".

      We have modified the sentence (Line 55).

      Comment 13: Line 57: rephrase to read "The fatality rate ranges from 1% among all cases to 3% among hospitalized cases but has been as high as 20% in immunocompromised patients."

      We have rephrased the sentence (Lines 55-57).

      Comment 14: Line 61: replace Holbrook et al. 2023 with Krause et al. 2021 (doi: 10.1093/cid/ciaa1216).

      We have replaced Holbrook et al. 2023 with Krause et al. 2021 (doi: 10.1093/cid/ciaa1216) (Line 60).

      Comment 15: Line 62: rephrase to read "... cytochrome b, which is targeted by atovaquone, were identified in patients with relapsing babesiosis." Here, also cite Lemieux et al., 2016; Simon et al., 2017; Rosenblatt et al, 2021, Marcos et al., 2022; Rogers et al., 2023; Krause et al., 2024.

      We have rephrased the sentence and cited the suggested references (Lines 61-64).

      Comment 16: Line 65: rephrase "Despite its efficacy, this combination can elicit adverse drug reactions (Vannier and Krause, 2012)."

      We have rephrased the sentence (Lines 65-66).

      Comment 17: Lines 75-77: rephrase to read "... of the drug indicated that CIP taken orally had good absorption, a long half-life, and ...".

      We have rephrased the sentence (Lines 76-77).

      Comment 18: Line 79: remove "the".

      We have removed "the" (Lines 79-80).

      Comment 19: Lines 83-85: rephrase to read "Mice infected with T. gondii that were treated with CIP on the day of infection and the following day had 90% fewer parasites 5 days post-infection (Zhou et al., 2014).".

      We have rephrased the sentence (Lines 83-85).

      Comment 20: Line 90: shorten the sentence to end as follows "... of CIP on Babesia parasites.".

      We have shortened the sentence in line 100 with your suggestion.

      Comment 21: Line 96: spell out CC<sub>50</sub>.

      We have spelled out the full form of CC<sub>50</sub> (Line 106).

      Comment 22: Line 104: remove "of body weight".

      We have removed "of body weight" (Line 116).

      Comment 23: Line 108: delete "from 8 DPI to 24 DPI, with statistically significant decreases".

      We have deleted "from 8 DPI to 24 DPI, with statistically significant decreases" (Line 120).

      Comment 24: Line 111: start a new paragraph with the sentence "BALB/c mice infected ...".

      We have started a new paragraph with the sentence "BALB/c mice infected ..." (Line 124).

      Comment 25: Line 123: replace "showed" with "occurred".

      We have replaced "showed" with "occurred" (Line 138).

      Comment 26: Line 127: rephrase to read "... sensitivity of the resistant parasite lines ...".

      We have rephrased the sentence (Line 144).

      Comment 27: Lines 137-140: rephrase to read ".... lines were lower when compared with ..." .

      We have rephrased the sentence (Line 158).

      Comment 28: Line 149: replace "BgATP4" with "B. gibsoni ATP4".

      We have replaced "BgATP4" with "B. gibsoni ATP4" (Line 183).

      Comment 29: Line 154: spell out "pLDDT" prior to pLDDT.

      We have provided the full form of pLDDT in the revised manuscript (Line 188).

      Comment 30: Lines 165-166: rephrase to read "CIP is a novel compound that inhibits Plasmodium development by targeting ATP4 and has been ...".

      We have rephrased the sentence (Lines 219-220).

      Comment 31: Lines 171-172: rephrase to read "...AZI, the combination recommended by the CDC in the United States.

      We have rephrased the sentence (Lines 226-227).

      Comment 32: Line 173: rephrase to read "... B. rodhaini infection, with survival up to 67%.".

      We have rephrased the sentence (Line 228).

      Comment 33: Lines 175-178: rephrase to read "In a previous study, a P. falciparum Dd2 strain that acquired resistance to CIP carried the G358S mutation in the ...".

      We have rephrased the sentence (Lines 230-231).

      Comment 34: Lines 179-180: rephrase to read "ATP4 is found in the parasite plasma membrane and is specific to the subclass of apicomplexan parasites.".

      We have rephrased the sentence (Lines 232-233).

      Comment 35: Lines 182-184: rephrase to read "In another study of Toxoplasma gondii, a cell line that carried the mutation G419S in the TgATP4 gene was 34 times ...".

      We have rephrased the sentence (Lines 235-237).

      Comment 36: Lines 201-202: deleted the last sentence of this paragraph.

      We have deleted the last sentence of the paragraph (Line 261).

      Comment 37: Line 228: rephrase to read "... that CIP had a weaker binding to BgATP4<sup>L921I</sup> than to BgATP4<sup>L921V</sup>.".

      We have rephrased the sentence (Lines 294-295).

      Comment 38: Lines 261-262: please state that drugs were prepared in sesame oil. Add "20 mg/kg" in front of AZI.

      We have stated that drugs were prepared in sesame oil and added "20 mg/kg" in front of AZI (Lines 350-352).

      Comment 39: Line 265: replace "care" with "treatments".

      We have replaced "care" with "treatments" (Line 355).

      Comment 40: Line 267: replace "observe" with "assess".

      We have replaced "observe" with "assess" (Line 357).

      Comment 41: Lines 269-271: please provide the absolute numbers of B. gibsoni infected RBCs and the absolute numbers of uninfected RBCs that were added to the culture medium.

      We thank the reviewer for this suggestion. In the revised manuscript, we have included the absolute numbers of B. gibsoni-infected RBCs and uninfected RBCs added to the culture medium. Specifically, the culture medium contained 10 μL (5×10 <sup>6</sup>) B. gibsoni iRBCs mixed with 40 μL (4×10 <sup>8</sup>) uninfected RBCs (Lines 360-361).

      Comment 42: Line 279: replace "confirmed" with "identified".

      We have replaced "confirmed" with "identified" (Line 370).

      Comment 43: Figure Supplement 2: the squares are not readily visible. Could the entire column corresponding to the mutation position be highlighted?

      We thank the reviewer for this suggestion. To improve visibility, we have changed the color of the squares and added arrows to make the mutation sites as prominent as possible. Unfortunately, due to software limitations, we were unable to highlight the entire column corresponding to the mutation position.

      Comment 44: Figure Supplement 4: for the parasite that carries a mutation in BgATP4, please delete the arrows that are next to BgATP4. These arrows send the message that the mutation ATP4 has an active role in pumping back Na<sup>+</sup> and H<sup>+</sup> back in their compartment, which is not the case.

      We thank the reviewer for their observation. The dotted arrows next to BgATP4 are intended to indicate the recovery of H<sup>+</sup> and Na<sup>+</sup> balance facilitated by the mutated ATP4, which reduces susceptibility to ATP4 inhibitors. To avoid potential confusion, we have revised the figure legend to clearly explain the role of the arrows, ensuring the intended message is accurately conveyed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      As our understanding of the immune system increases it becomes clear that murine models of immunity cannot always prove an accurate model system for human immunity. However, mechanistic studies in humans are necessarily limited. To bridge this gap many groups have worked on developing humanised mouse models in which human immune cells are introduced into mice allowing their fine manipulation. However, since human immune cells will attack murine tissues, it has proven complex to establish a human-like immune system in mice. To help address this, Vecchione et al have previously developed several models using human cell transfer into mice with or without human thymic fragments that allow negative selection of autoreactive cells. In this report they focus on the examination of the function of the B-helper CD4 T-cell subsets T-follicular helper (Tfh) and T-peripheral helper (Tph) cells. They demonstrate that these cells are able to drive both autoantibody production and can also induce B-cell independent autoimmunity.

      Strengths:

      A strength of this paper is that currently there is no well-established model for Tfh or Tph in HIS mice and that currently there is no clear murine Tph equivalent making new models for the study of this cell type of value. Equally, since many HIS mice struggle to maintain effective follicular structures Tfh models in HIS mice are not well established giving additional value to this model.

      Weaknesses:

      A weakness of the paper is that the models seem to lack a clear ability to generate germinal centres. For Tfh it is unclear how we can interpret their function without the structure where they have the greatest influence. In some cases, the definition of Tph does not seem to differentiate well between Tph and highly activated CD4 T-cells in general.

      The limited ability of HIS mice to generate well-defined lymphoid tissue structures is well noted. While the emergence of T cells in HIS mice increases the size of lymphoid tissues, the structure remains suboptimal and vaccination responses are limited. We believe this is mainly due to the common gamma chain knockout, which results in a lack of murine lymphoid tissue inducer (LTi) cells, which require IL-7 signaling to interact with murine mesenchymal cells for normal lymphoid tissue development. Ongoing efforts by our group and others aim to address this challenge by providing the necessary signals. Despite this challenge, these mice do develop Tfh cells, allowing us to study this cell subset.

      We agree with the reviewer that the distinction between Tph and highly activated CD4 T cells is incomplete.

      However, we have provided several distinctions in our manuscript that support the presence of Tph in HIS mice: 1) Tph cells exhibit very high levels of PD-1 expression, whereas other activated CD4 cells have varying levels of PD-1 expression. 2) Tph cells express IL-21. 3) Tph cells promote B cell differentiation and antibody production. 

      Reviewer #2 (Public Review):

      Summary:

      Humanized mice, developed by transplanting human cells into immunodeficient NSG mice to recapitulate the human immune system, are utilized in basic life science research and preclinical trials of pharmaceuticals in fields such as oncology, immunology, and regenerative medicine. However, there are limitations to using humanized mice for mechanistic analysis as models of autoimmune diseases due to the unnatural T cell selection, antigen presentation/recognition process, and immune system disruption due to xenogeneic GVHD onset.

      In the present study, Vecchione et al. detailed the mechanisms of autoimmune disease-like pathologies observed in a humanized mouse (Human immune system; HIS mouse) model, demonstrating the importance of CD4+ Tfh and Tph cells for the disease onset. They clarified the conditions under which these T cells become reactive using techniques involving the human thymus engraftment and mouse thymectomy, showing their ability to trigger B cell responses, although this was not a major factor in the mouse pathology. These valuable findings provide an essential basis for interpreting past and future autoimmune disease research conducted using HIS mice.

      Strengths:

      (1) Mice transplanted with human thymus and HSCs were repeatedly executed with sufficient reproducibility, with each experiment sometimes taking over 30 weeks and requiring desperate efforts. While the interpretation of the results is still debatable, these description is valuable knowledge for this field of research.

      (2) Mechanistic analysis of T-B interaction in humanized mice, which has not been extensively addressed before, suggests part of the activation mechanism of autoreactive B cells. Additionally, the differences in pathogenicity due to T cell selection by either the mouse or human thymus are emphasized, which encompasses the essential mechanisms of immune tolerance and activation in both central and peripheral systems.

      Weaknesses:

      (1) In this manuscript, for example in Figure 2, the proportion of suppressive cells like regulatory T cells is not clarified, making it unclear to what extent the percentages of Tph or Tfh cells reflect immune activation. It would have been preferable to distinguish follicular regulatory T cells, at least. While Figure 3 shows Tregs are gated out using CD25- cells, it is unclear how the presence of Treg cells affects the overall cell population immunogenic functionally.

      We analyzed the % FOXP3+ cells and the % of ICOS+ cells within the Tfh and Tph cells in the spleen of Hu/Hu and Mu/Hu mice at 20 weeks post-transplantation. Importantly, we see no difference in FOXP3 expression between Tfh of Mu/Hu and Hu/Hu mice. The results have been added to panels J and K of Figure 2. 

      (2) The definition of "Disease" discussed after Figure 6 should be explicitly described in the Methods section. It seems to follow Khosravi-Maharlooei et al. 2021. If the disease onset determination aligns with GVHD scoring, generally an indicator of T cell response, it is unsurprising that B cell contribution is negligible. The accelerated disease onset by B cell depletion likely results from lymphopenia-induced T cell activation. However, this result does not prove that these mice avoid organ-specific autoimmune diseases mediated by auto-antibodies and the current conclusion by the authors may overlook significant changes. For instance, would defining Disease Onset by the appearance of circulating autoantibodies alter the result of Disease-Free curve? Are there possibly histological findings at the endpoint of the experiment suggesting tissue damage by autoantibodies?

      We have added a definition of disease to the Methods section as requested. Regarding the possibility of antibody-mediated disease that may be missed by this definition, we acknowledge this point in the Discussion section. However, we also discuss the point that the deficient complement pathway in NSG mice is likely to have protected the HIS mice from autoantibody-mediated organ damage.

      (3) Helper functions, such as differentiating B cells into CXCR5+, were demonstrated for both Hu/Hu and Mu/Huderived T cells. This function seemed higher in Hu/Hu than in Mu/Hu. From the results in Figure 7-8, Hu/Hu Tph/Tfh cells have a stronger T cell identity and higher activation capacity in vivo on a per-cell basis than Mu/Hu's ones. However, Hu/Hu-T cells lacked an ability to induce class-switching in contrast to Mu/Hu's. The mechanisms causing these functional differences were not fully discussed. Discussions touching on possible changes in TCR repertoire diversity between Mu/Hu- and Hu/Hu- T cells would have been beneficial. 

      Consistent with the reviewer’s suggestion, we have previously shown that the TCR repertoire in Mu/Hu mice is less diverse than that in Hu/Hu mice (Khosravi-Maharlooei M, et al., J Autoimmun., 2021). We believe that the narrowed TCR repertoire in the periphery of Mu/Hu mice, combined with the inadequate negative selection in the murine thymus reported in the paper cited above, results in selective peripheral expansion primarily of the few T cell clones that are cross-reactive with HLA/murine self peptide complexes presented by human APCs in the periphery.  We have discussed the reasons why these cells, when transferred to secondary recipients containing the same APCs, might not be as active as the more diverse, HLA-selected T cell repertoire transferred from Hu/Hu mice.  These possible reasons include exhaustion of the T cells in Mu/Hu mice, limited expression of the few targeted HLA-peptide complexes recognized by the narrow cross-reactive TCR repertoire of Mu/Hu T cells and the consequent relatively impaired T-B cell collaboration in these mice.   

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      The authors note that they removed an outlier result from Figures 1 B & C. With only 4 mice it seems difficult to see exactly how they determined the result was an outlier. Presumably, it was quite different from the others but in such a small dataset removing data without a very clear statistical rationale seems likely to strongly influence the results.

      We have revised Fig 1 to include the previously-deleted outlier mouse.   

      Figure 4. The authors describe the follicular area. Were they able to observe any GC-like structures in their data?

      From the examples, I can see that the PNA staining is sometimes diffuse but even if the authors felt they could not observe a distinct GC this should be stated and discussed in the text.

      We now describe the three colors IF staining in more detail in accordance with this comment. We characterized 4 Hu/Hu and 3 Mu/Hu spleens earlier than 20 weeks post-transplant. In all of these mice, distinct B cell areas (CD20+) were obvious and PNA+ cells were more concentrated in the B cell zones. We stained 4 Hu/Hu and 3 Mu/Hu spleens from mice between 20-30 weeks post-transplant and found that B cell areas were smaller in all these spleens compared to those taken before 20-weeks post-transplant. PNA+ areas are also more diffusely distributed and are not enriched in the B cell areas. Only 2 Mu/Hu mice showed clear B cell zones with some enriched PNA+ areas in the B cell zones. Additionally, we stained 2 Hu/Hu and 2 Mu/Hu mice later than week 30 post-transplant. No distinct B cell areas were observed in any of the spleens of these mice and PNA+ cells were diffusely distributed.  

      In Figure 3E the authors sort CD25-CXCR5-CD45RA- CD4 T-cells as Tph. This does seem a very loose definition including essentially all non-naïve CD4 cells that are not Tregs or Tfh.

      We agree with the reviewer that the distinction between Tph and highly activated CD4 T cells is incomplete.

      However, we have provided several distinctions in our manuscript that support the presence of Tph in HIS mice: 1) Tph cells exhibit very high levels of PD-1, whereas other activated CD4 cells have varying levels of PD-1 expression. 2) Tph cells express IL-21. 3) Tph cells promote B cell differentiation and antibody production. 

      Tph is sometimes a hard cell type to separate from more general highly activated CD4 T-cells. The broad CXCR5PD1+ phenotype they have used is common in the literature and the authors have confirmed some enrichment of IL21 production by these cells. However, they should consider if there are ways of further confirming this by examination of other markers such as CCR2 and CCR5 or elimination of other effector identities such as Th1 and Th17 or PD1+ exhaustion phenotypes.

      For this study, we chose to follow the commonly used definitions in the literature for Tph and Tfh cells. For this reason, we are careful to refer to “Tph-like” cells rather than Tph cells in this manuscript. Distinguishing Tph cells from other subsets of activated CD4 cells would require further studies such as single cell RNA seq, which we hope to be able to perform in the future with additional funding.  

      Figure 8. The authors perform some analysis of B-cell phenotypes looking at markers such as CD27, IgD in 8B, and CD11c in 8C. Why is CD11c considered in isolation? The level of expression of the other markers would change how this data would be interpreted e.g. IgD-CD27-CD11c+ = DN2/Atypical cells, IgD-CD27+CD11c+ = Activated or ageassociated, etc.

      In response to this comment, we reanalyzed the splenic samples of the donor Mu/Hu and Hu/Hu mice and their adoptive recipients. Interestingly, in the T cell donors, the Mu/Hu B cells included greater proportions of activated/age-associated B cells (IgD-CD27+CD11c+) and atypical cells (IgD-CD27-CD11c+), compared to the Hu/Hu B cells. This is consistent with the increased disease, increased Tph/Tfh and increased IgG antibody findings in the primary Mu/Hu compared to Hu/Hu mice. These results have been added to Figure 5G. We performed a similar analysis in the blood (week 9) and spleen of adoptive recipient mice. These studies showed that activated/ageassociated B cells (IgD-CD27+CD11c+) and atypical cells (IgD-CD27-CD11c+) were significantly increased in the adoptive recipients of Hu/Hu Tph and Tfh cells compared to the adoptive recipients of Mu/Hu Tph and Tfh cells (Fig. 8C). These results are consistent with the disease, T cell expansion and antibody results in the adoptive recipients. 

      Data not shown occurs often in this manuscript. In some cases what is not shown is potentially important. The authors note in the text relating to Figure 7 that the "purity of the cell populations as assessed by FCM ranged from 56-60% (data not shown)". Those numbers are a little alarming. They are referring to the purity of the FCS sorted Tfh and Tph prior to transfer? Currently, some of the discussion of this paper is about the possibility of plasticity, with Tfh switching into a Tph phenotype. If the transferred cell populations are 56-60% pure I don't think it is possible to make any interpretation of plasticity.

      We looked into this further and realized that the purity figure cited in the original manuscript was erroneous due to a misunderstanding on the part of the first author of a question from the senior author. Unfortunately, data on the purity of the FACS-sorted population was not saved. However, we have added panel B to Figure 7 to show the sorting strategy for Tfh and Tph cells.   We agree that any discussion of plasticity between these cell types is speculative, as outgrowth of a minor population is possible even from well-purified sorted cells.  

      Minor points:

      Some graphs have issues with presentation; Figures 5D and 5E, split scale clips data points. 5F the color representing time would be better replaced with direct labels. 6C and 6C some distortion of text clipping other elements.

      We changed 5D and 5E y axis scales to avoid cutting the data points. Also, we changed 5F labels. Distortion of text clipping and other elements in Fig 6E and 6A have been corrected.  

      The abbreviation LIP is used in the abstract without a clear definition until later in the text.

      This abbreviation has been defined again in the text.

      Generally, the discussion section is quite long.

      We agree that the discussion is quite long, but the results are quite complex and require considerable discussion.  We have attempted to be as concise as possible.

      Reviewer #2 (Recommendations For The Authors):

      Suggestion

      Can Supplementary Figures be merged into the mains for the convenience of readers? There is enough extra margin.

      We prefer to keep the order of main and supplementary figures as they are. 

      There are some confusing results which I would recommend to make the additional explanation for readers. For example, about 10% of Hu/Hu CD3+ T cells reacted to Auto-DC in Figure 1B, but neither CD4+ nor CD8+ cells did in Figure 1C.

      We have re-analyzed the data in Fig 1 and included the previously-deleted outlier mouse. 

      Minor

      Figure 3C

      The figure legend does not explain the figure. Hu/Mu or Mu/Mu?

      Both groups were combined in the figure, as the results were similar for both.  The N per group is given in the figure legend.  The same applies to figure 3D.

      Figure 4B, 4C

      Why were Hu/Hu and Mu/Hu data merged only in 4B? They should be discussed in the context of parallel comparison. Both y-axis labels are the same between B and C despite the legend saying differently.

      We switched the order of Figure 4B and 4C, each of which serves a different purpose. Figure 4B aims to demonstrate the similarity between the two groups at each timepoint.  Figure 4C combines the two groups in order to provide sufficient animal numbers to demonstrate the statistically significant changes over time. 

      Figure 5D

      The axis label was missing and the uncertain bar emerged. The authors should replace it with the corrected one.

      The axis and the bar in 5D have been corrected.

      Figure 5F

      The legend does not explain the figure. What are these numbers? Also, it is better if the authors add a detailed explanation to the manuscript about the reason why the sum of antibody titer represents the poly-reactivity of IgM in these mice.

      The numbers in the previous version of the figure were eartag numbers, which we have now renumbered as animal 1,2,3, etc in each group. Please refer to the final paragraph of the "Autoreactivity of IgM and IgG in HIS Mice" section in the Results section for an explanation of IgM polyreactivity.

      Fig. 7D-E etc.

      The definition of Asterisk is insufficient. Between what to what in the multiple comparisons?

      The green asterisks show significant differences between the Tph in Hu/Hu vs Mu/Hu mice, while the orange asterisks show significant differences between the Tfh in Hu/Hu vs Mu/Hu mice. This has been added to the figure legend.

      Figure 7 ~ Figure 8

      The legends on the figure are confusing due to the different order of figures. The scales are inappropriate in some figures. The readers cannot interpret the data from the unfairly compressed plots.

      We made the plots bigger to make them readable and changed the order.

      Methods

      In the description of B cell depletion Experiments, the authors should directly mention the figure number instead of "In the second Experiment ..."

      We have corrected this in the Methods section.

      There is no definition of how to define the "disease" onset.

      This definition has been added to the Methods section.

      Several undefined abbreviations: "LIP", "BLT" ...

      We defined these in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1- I would like the authors to discuss and justify their use of high-dose (1.3%) isolfurane. A recent consensus paper on rat fMRI (Grandjean et al., "A Consensus Protocol for Functional Connectivity Analysis in the Rat Brain.") found that medetomidine combined with low dose isoflurane provided optimal control of physiology and fMRI signal. To overcome any doubts about the effects of the high-dose anaesthetic I'd encourage the authors to show the results of their functional connectivity specificity using the same or similar image processing protocol as described in that consensus paper. This is especially true since the fMRI ICs in Figure 2A appear fairly restricted.

      We thank the reviewer for their insightful comments. We agree that the combination of medetomidine and isoflurane, as recommended by Grandjean et al. in their consensus paper, provides superior physiological stability and fMRI signal quality, and should indeed be considered the preferred protocol for future studies. In fact, we have adopted this combination in our subsequent research [1]. However, the data acquired in the present study were acquired prior to the publication of the consensus recommendations and have been previously published [2, 3]. While isoflurane is not the ideal anesthetic for functional connectivity studies, we have demonstrated in earlier work [4], that using isoflurane at 1.3% maintains stable physiological parameters and avoids burst suppression, a key issue with higher isoflurane doses.

      Regarding preprocessing, we acknowledge the importance of standardized approaches as outlined in the consensus paper. However, to maintain methodological consistency with our prior work, we retained the original preprocessing pipeline for this study. This decision ensures comparability with our previous analyses. To address the reviewer’s concerns and encourage further verification, we have uploaded the full dataset to a public repository (as suggested in Comment 4). This will enable other researchers to reanalyze the data using updated preprocessing pipelines or explore additional analyses.

      We have updated the manuscript discussion (page 19) to clearly acknowledge these points:

      “One limitation of our study is that our experimental protocols predate the recently published consensus recommendations for rat fMRI [42], particularly concerning anesthesia and preprocessing pipelines. The use of isoflurane anesthesia, although common at the time of data acquisition, introduces a potential confound due to its known effects on neuronal activity. However, we previously demonstrated that isoflurane at 1.3% maintains stable physiological parameters and avoids burst suppression [43], a concern at higher doses. Furthermore, other studies have reported that low-dose isoflurane remains feasible for resting-state functional connectivity studies [44]. While isoflurane, as a GABA-A agonist, could theoretically interact with the mechanisms of MDMA in the brain, we found no evidence in the literature suggesting significant cross-talk between these substances. Future studies employing medetomidine-based protocols may help minimize this potential confound.

      Regarding data preprocessing, we chose to retain the same pipeline used in our prior publications [13, 14] to maintain methodological consistency. While we recognize the advantages of adopting standardized preprocessing as outlined in the consensus guidelines, this approach ensures comparability with our previous analyses. To facilitate further investigation, we have made the full dataset publicly available (see Data Availability Statement), enabling reanalysis with updated pipelines or additional explorations of this dataset.”

      Comment 2 - I'd also be interested to read more about why the cerebellum was chosen as a reference region, given that serotonin is highly expressed in the cerebellum, and what effects the choice of reference region has on their quantification.

      This is something we ourselves have examined in a paper, dedicated to determine the most suitable reference region for [11C]DASB, and while the reviewer is correct in saying there is also serotonin in the cerebellum, we found the lowest binding for this tracer in the cerebellar gray matter, recommending this region as a valid reference area. (“Displaceable binding of (11)C-DASB was found in all brain regions of both rats and mice, with the highest binding being in the thalamus and the lowest in the cerebellum. In rats, displaceable binding was largely reduced in the cerebellar cortex”, please refer to [5]).

      We amended our materials and methods part to specify that we had shown in this previous publication that the cerebellar gray matter is appropriate as a reference region (page 6):

      “Binding potentials were calculated frame-wise for all dynamic PET scans using the DVR-1 (equation 1) to generate regional BPND values with the cerebellar gray matter as a reference region, which our earlier studies have demonstrated to be the most appropriate for this tracer in rats [5, 6]:”

      Comment 3 - The PET ICs appear less bilateral than the fMRI ICs. Is that simply a thresholding artefact or is it a real signal?

      We thank the reviewer for this observation. The reduced bilaterality of PET ICs compared to fMRI ICs is likely due to the inherent limitation in the temporal resolution of PET, which provides significantly fewer frames (100 frames compared to 3000 frames for fMRI). This lower temporal resolution leads to reduced signal-to-noise ratio when computing the ICA, which can affect the stability and symmetry of the ICs during ICA computation, particularly at higher IC numbers. While thresholding may also a minor role, we believe the primary factor is poorer SNR associated with the PET data. We have clarified this point in the discussion section (page 17) as follows:

      “In our analysis, PET ICs appeared less bilateral than fMRI ICs. This is likely due to the lower temporal resolution of PET (100 frames) compared to fMRI (3000 frames), resulting in reduced signal-to-noise ratio (SNR) and potentially affecting the stability and symmetry of the independent components.”

      Comment 4 - "The data will be made available upon reasonable request" is not sufficient - please deposit the data in an open repository and link to its location.

      We agree with the request of the reviewer and uploaded the data to a Dryad repository. We amended our Data Availability Statement accordingly.

      Comment 5 (recommendation) - Please add the age and sex of the rats in lines 92-97.

      Amended.

      Comment 6 (recommendation) - There are multiple typos throughout the manuscript - for example, "z-vlaue" on line 164, "negligable" on line 194, etc.. Sometimes the 11 in 11C is superscripted, sometimes it isn't. This paper would benefit from a careful proofread.

      Thank you for pointing this out. We sent the manuscript for language and grammar editing to AJE (see certificate).

      Reviewer 2:

      Comment 1 - While the study protocol is referenced in the paper, it would be useful to at least report whether the study uses bolus, constant infusion, or a combination of the two and the duration of the frames chosen for reconstruction. Minimal details on anesthesia should also be reported, clarifying whether an interaction between the pharmacological agent for anesthesia and MDMA can be expected (whole-brain or in specific regions).

      We fully agree that this would improve the readability of our manuscript and added the information to the materials and methods and discussion accordingly. Please refer to page 4/5.

      Comment 2 - Some terminology is used in a bit unclear way. E.g. "seed-based" usually refers to seed-to-voxel and not ROI-to-ROI analysis, or e.g. it is a bit confusing to have IC1 called SERT network when in fact all ICs derived from DASB data are SERT networks. Perhaps a different wording could be used (IC1 = SERT xxxxx network; IC2= SERT salience network).

      Based on the reviewer´s suggestion, we suggest to rename IC1 and IC2 according to their anatomical and functional characteristics (page 13):

      “IC1 = SERT Salience Network: This name highlights the involvement of the regions typically associated with the salience network (e.g., CPu, Cg, NAc, Amyg, Ins, mPFC), which play key roles in emotional and cognitive processing.”

      “IC2 = SERT Subcortical Network: This name reflects the involvement of subcortical regions which play a role in arousal, stress response, and autonomic regulation, which are heavily modulated by serotonin in areas like the hypothalamus, PAG, and thalamus.”

      Comment 3 - The limited sample size for the rats undergoing pharmacological stimulation which might make the study (potentially) not particularly powerful. This could not be a problem if the MDMA effect observed is particularly consistent across rats. Information on inter-individual variability of FC, MC, and BPND could be provided in this regard.

      We thank the reviewer for raising this point. To address the concern about limited sample size and inter-individual variability, we have added this information to Figures 5 B and D. Regarding the BPND variability, the dotted lines in Figure 3 indicate the standard deviation in the regional BPNDs, however, this was not clearly stated in the original figure description. We have now amended the figure legend to explicitly clarify this point.

      Comment 4 (recommendation) - "Our research employs a novel approach named "molecular connectivity" (MC), which merges the strengths of various imaging methods to offer a comprehensive view of how molecules interact within the brain and affect its function." I'd recommend rephrasing to "..how molecular interact across different areas within the brain..". Molecular connectivity is a potentially ambiguous term (used to study interactions across different molecules (in the same compartment/environment) vs. to study interactions across the same molecules in different areas). I'd add a couple of references to help the reader disambiguate too (e.g. https://pubmed.ncbi.nlm.nih.gov/30544240/ , https://pubmed.ncbi.nlm.nih.gov/36621368/)

      We appreciate the reviewer’s suggestion and agree that the term "Molecular Connectivity" could be ambiguous. To clarify, we rephrased the description to emphasize that our approach specifically examines interactions of the same molecule (i.e., serotonin transporter) across different brain regions, rather than interactions between different molecules within the same environment. We propose the following revised text (page 2):

      “Our research employs a novel approach termed molecular connectivity (MC), which combines the strengths of various imaging methods to provide a comprehensive view of how specific molecules, such as the serotonin transporter, interact across different brain regions and influence brain function.”

      Additionally, we will incorporate the suggested references to help the reader further contextualize the use of this term.

      Comment 5 - In the methods, it is not clear if for MC the authors also compute ROI-to-ROI correlations or only ICA.

      Thank you for highlighting this point. To clarify, our MC analysis, includes both ROI-to-ROI correlations and ICA. Specifically, as described at the end of the “Molecular Connectivity Analysis” subchapter, we compute ROI-to-ROI correlations using the following steps: 1. The first 20 minutes of each scan are discarded to account for perfusion effects. 2. A detrending approach is applied to the remaining 60 minutes of BP<sub>ND</sub> time courses. 3. ROI-to-ROI calculations are then calculated and organized into subject-level correlation matrices, which are subsequently z-transformed to generate mean correlation matrices across subjects.

      We revised the methods section to explicitly state that both ROI-to-ROI correlations and ICA are integral components of the MC analysis to ensure this point is clear to readers (page 6).

      “The BP<sub>ND</sub> time courses were then used to calculate MC as described above for fMRI: ROI-to-ROI subject-level correlation matrices between all regional time courses were generated and z-transformed correlation coefficients were used to calculate mean correlation matrices.”

      Comment 7 - In the discussion, it could be useful to relate IC1 and IC2 to well-established neuroanatomical/molecular knowledge of the serotoninergic system. Did the authors expect the IC1 and IC2 anatomical distributions? is there a plausible biological reason as to why the time courses of BPnd variations would be somehow different between IC1 and IC2?

      We appreciate the reviewer’s insightful comment and agree on the importance of relating IC1 and IC2 to well-established neuroanatomical and molecular knowledge of the serotonergic system.

      In our discussion, we noted that IC1 primarily encompasses subcortical structures such as the brainstem, midbrain, and thalamus. These regions are consistent with areas housing dense serotonergic projections originating from the raphe nuclei, the primary source of serotonin release. In contrast, IC2 involves limbic and cortical regions - including the striatum, amygdala, cingulate, insular, and prefrontal cortices - which are key targets of the serotonergic pathways. This anatomical distinction aligns with the hierarchical organization of the serotonergic system, where the brainstem nuclei exert both local and distal serotonergic modulation.

      The observed differences in the temporal dynamics of the binding potential (BP<sub>ND</sub>) variations between IC1 and IC2 likely reflect the distinct functional roles of these regions within the serotonergic network. The more immediate changes in IC1 could be attributed to the direct effect of MDMA on the raphe nuclei, leading to rapid serotonin release in subcortical structures. In contrast, the delayed changes in IC2 may reflect downstream modulation in cortical and limbic regions involved in processing more complex emotional and cognitive functions.

      That said, while these interpretations are plausible based on current neuroanatomical and functional knowledge, the exact biological mechanisms underlying the differential time courses remain unclear. As discussed in the manuscript, future studies incorporating direct, simultaneous measurements of serotonin levels and imaging data will be essential to fully elucidate the temporal and spatial dynamics of serotonin transmission in these regions. We have revised to better highlight this limitation in the discussion section (page 17) as an important area for further investigation:

      “Our results demonstrate that compared with FC, MDMA induces more pronounced changes in MCs, particularly in regions associated with the SERT subcortical network. The distinct temporal dynamics of BPnd variations between these components may reflect the hierarchical organization of the serotonergic system. Specifically, the raphe nuclei, as the primary source of serotonin, are likely to exert more immediate modulation on posterior subcortical structures (IC2), whereas downstream effects on limbic and cortical regions (IC1) may occur more gradually. While these findings align with current neuroanatomical and molecular knowledge, the precise biological mechanisms driving these temporal differences remain unclear. Future investigations are warranted to elucidate these mechanisms. Future studies combining direct measurements of serotonin levels with neuroimaging data will be critical to fully understanding these components’ distinct roles and temporal profiles in regulating serotonergic function.”

      Comment 8 - In the discussion (physiological basis), could the authors detail the expected "time scale" in changes in SERT expression? How quickly can SERT expression change, especially under resting-state conditions? Is it reasonable to consider tracer fluctuations under rest conditions as biologically meaningful?

      SERT regulation can occur over different time scales depending on the mechanism involved [7].

      Acute, rapid changes (milliseconds to seconds): Protein-protein interactions with key regulatory proteins (e.g., syntaxin1A, neuronal nitric oxide synthase) can lead to rapid modulation of SERT surface expression [8-11]. These interactions often involve changes in transporter trafficking or conformational states and can occur within milliseconds to seconds. For example, syntaxin1A directly interacts with the N-terminus of SERT, influencing its availability on the plasma membrane within short timescales.

      Intermediate time scales (seconds to minutes): Posttranslational modifications, such as phosphorylation by kinases (e.g., protein kinase C) or dephosphorylation by phosphatases, are known to influence SERT function and surface expression [12-14]. These processes are typically initiated in response to cellular signaling and occur over seconds to minutes, affecting the SERT trafficking dynamics and serotonin uptake capacity [15, 16].

      Longer-term changes (minutes to hours): Longer-term regulation involves processes like endocytosis, recycling, or degradation of SERT. These pathways typically take minutes to hours and are often part of more sustained cellular responses to changes in neuronal activity or serotonin levels. Such changes are slower but contribute to the overall cellular homeostasis of SERT under prolonged stimulation.

      Under resting-state conditions, where neurons are not subjected to rapid or dramatic fluctuations in neurotransmitter release or signaling, SERT expression and activity are generally stable but still subject to subtle fluctuations due to ongoing basal regulatory processes. Basal phosphorylation or low-level protein-protein interactions can still dynamically modulate SERT trafficking and function, albeit at a lower intensity than under stimulated conditions. These fluctuations, although smaller in magnitude, may reflect fine-tuning of serotonin homeostasis and can occur on shorter timescales (seconds to minutes).

      Biological Relevance of Tracer Fluctuations at Rest:

      It is reasonable to consider that tracer fluctuations under resting conditions could reflect biologically meaningful variations in SERT expression and function. Even subtle shifts in SERT surface availability or activity can impact serotonin clearance and signaling, given the fine balance required to maintain serotonergic tone. These fluctuations may reflect intrinsic neuronal variability or ongoing homeostatic adjustments to maintain optimal neurotransmitter levels or serve as early indicators of adaptive responses to environmental or physiological changes before more overt modifications in transporter expression or activity become apparent.

      In summary, while SERT expression can change rapidly in response to signaling events (milliseconds to minutes), even under resting-state conditions, subtle regulatory fluctuations can be biologically meaningful. These fluctuations likely reflect ongoing regulatory adjustments essential for maintaining serotonergic balance and should not be disregarded as noise, particularly in experimental measurements using tracers.

      We added the following paragraph to the discussion (page 16):

      In addition, SERT regulation occurs over multiple time scales, ranging from milliseconds to hours, depending on the mechanism involved [31]. Rapid changes in SERT surface expression can be mediated by protein-protein interactions or posttranslational modifications [32, 33], such as phosphorylation, which occur on a timescale of milliseconds to minutes. These processes dynamically modulate surface availability and function, allowing fine-tuned regulation of serotonin uptake even under resting-state conditions. Additionally, while slower processes involving endocytosis, recycling, and degradation typically occur over minutes to hours, subtle fluctuations in SERT trafficking and activity can still occur under basal conditions. These minor yet biologically relevant changes likely reflect ongoing homeostatic regulation essential for maintaining serotonergic balance. Therefore, tracer fluctuations observed during resting-state measurements should not be dismissed, as they may represent meaningful variations in SERT regulation that contribute to the fine control of serotonin clearance.

      Comment 9 - In the discussion, the SERT network results should be commented on more extensively, as there is now only a generic reference to MC changes being stronger than FC ones, without spatial reference to the SERT network (while only negative salience network results are referenced explicitly instead, making the paragraph a bit confusing).

      We expanded the discussion to accommodate a more thorough contemplation of this network. This revised paragraph (page 17) directly addresses the spatial aspects of the SERT network, highlighting the specific regions involved in serotonergic connectivity and contrasting molecular and functional connectivity changes induced by MDMA.

      Comment 10 - Figure 3; I'd switch left and right charts in the bottom panel (last row only), to keep the SERT network always on the left of the Figure.

      We agree with the suggestion and changed the figure accordingly.

      Comment 11 - Figure 4: I'd add FC decreases to the figure, to allow the reader to compare BPnd, MC, and FC changes more easily and I'd add a horizontal line at the equivalent of e.g. Z-1.96 (or similar) so that it is clear which measures/regions display significant changes.

      We prefer to keep the figure focusing on the two analyses of PET alterations, since we want to emphasize their complementarity in the context of PET specifically. However, we added lines indicating significances, in line with the reviewer’s suggestion.

      Comment 12 - In Figure 5D, the y-axis mentioned FC but I suppose it should mention MC.

      We amended the figure accordingly, together with the changes to the names of the networks implemented across the manuscript.

      (1) Marciano, S., et al., Combining CRISPR-Cas9 and brain imaging to study the link from genes to molecules to networks. Proc Natl Acad Sci U S A, 2022. 119(40): p. e2122552119.

      (2) Ionescu, T.M., et al., Striatal and prefrontal D2R and SERT distributions contrastingly correlate with default-mode connectivity. Neuroimage, 2021. 243: p. 118501.

      (3) Ionescu, T.M., et al., Neurovascular Uncoupling: Multimodal Imaging Delineates the Acute Effects of 3,4-Methylenedioxymethamphetamine. J Nucl Med, 2023. 64(3): p. 466-471.

      (4) Ionescu, T.M., et al., Elucidating the complementarity of resting-state networks derived from dynamic [(18)F]FDG and hemodynamic fluctuations using simultaneous small-animal PET/MRI. Neuroimage, 2021. 236: p. 118045.

      (5) Walker, M., et al., In Vivo Evaluation of 11C-DASB for Quantitative SERT Imaging in Rats and Mice. J Nucl Med, 2016. 57(1): p. 115-21.

      (6) Walker, M., et al., Imaging SERT Availability in a Rat Model of L-DOPA-Induced Dyskinesia. Mol Imaging Biol, 2020. 22(3): p. 634-642.

      (7) Lau, T. and P. Schloss, Differential regulation of serotonin transporter cell surface expression. Wiley Interdisciplinary Reviews: Membrane Transport and Signaling, 2012. 1(3): p. 259-268.

      (8) Haase, J., et al., Regulation of the serotonin transporter by interacting proteins. Biochem Soc Trans, 2001. 29(Pt 6): p. 722-8.

      (9) Quick, M.W., Regulating the conducting states of a mammalian serotonin transporter. Neuron, 2003. 40(3): p. 537-49.

      (10) Ciccone, M.A., et al., Calcium/calmodulin-dependent kinase II regulates the interaction between the serotonin transporter and syntaxin 1A. Neuropharmacology, 2008. 55(5): p. 763-70.

      (11) Chanrion, B., et al., Physical interaction between the serotonin transporter and neuronal nitric oxide synthase underlies reciprocal modulation of their activity. Proc Natl Acad Sci U S A, 2007. 104(19): p. 8119-24.

      (12) Qian, Y., et al., Protein kinase C activation regulates human serotonin transporters in HEK-293 cells via altered cell surface expression. J Neurosci, 1997. 17(1): p. 45-57.

      (13) Ramamoorthy, S., et al., Phosphorylation and regulation of antidepressant-sensitive serotonin transporters. J Biol Chem, 1998. 273(4): p. 2458-66.

      (14) Jayanthi, L.D., et al., Evidence for biphasic effects of protein kinase C on serotonin transporter function, endocytosis, and phosphorylation. Mol Pharmacol, 2005. 67(6): p. 2077-87.

      (15) Steiner, J.A., A.M. Carneiro, and R.D. Blakely, Going with the flow: trafficking-dependent and -independent regulation of serotonin transport. Traffic, 2008. 9(9): p. 1393-402.

      (16) Lau, T., et al., Monitoring mouse serotonin transporter internalization in stem cell-derived serotonergic neurons by confocal laser scanning microscopy. Neurochem Int, 2009. 54(3-4): p. 271-6.

    1. Author response:

      The following is the authors’ response to the previous reviews

      According to the reviewers' comments, we appreciate your substantial updates. However, the statistical issue remains unsolved. The following is a general way to get fold changes between controls and experimental samples. Each sample will generate relative differences between target molecules and internal controls. For the case of Fig 1B, the target is pSmad2, and the internal control is the total Smad2. Three control samples will generate three numbers for pSmad2/Smad2 ratios with variations. Similarly, T204D samples will generate three numbers with variations. Then, the average of these three numbers will be set as 1 (with variations) to calculate fold changes between the control and T204D groups. The point is that the statistical significance needs to be evaluated between two groups with variations. This standard method differs from what you described in the manuscript. I hope this explains why the issue needs to be fixed. Please work on the following 11 panels to revise.

      (1) Fig 1B, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by T204D.

      (2) Fig 1C, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by Tb/Rudhira.

      (3) Fig 1D, QRT PCR, pai1/mmp9, fold change by Tb treatment, reference not disclosed.

      (4) Fig 2A, migration, crystal red absorbance.

      (5) Fig 2B, migration, crystal red absorbance.

      (6) Fig 4A, QRT PCR, fold change by Tb.

      (7) Fig 4B, WB, Rudhira, fold change by Tb.

      (8) Fig 4C, intensity, with variation, fine.

      (9) Fig 4D, WB, Rudhira, loading control GAPDH, fold change by Smad2/3 silencing.

      (10) Fig 5A, WB, Rudhira/Glu-Tub, loading control GAPDH, fold change by Tb and/or AcD.

      (11) Fig 5C, WB, Glu-Tub.

      For western blots:

      Graphs for western blots in the following figures have been modified to show the variance in controls, as suggested:

      (1) Fig 1B, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by T204D.

      (2) Fig 1C, WB, pSmad2, reference Smad2, loading control GAPDH, fold change by Tb/Rudhira.

      (7) Fig 4B, WB, Rudhira, fold change by Tb.

      (9) Fig 4D, WB, Rudhira, loading control GAPDH, fold change by Smad2/3 silencing.

      (10) Fig 5A, WB, Rudhira/Glu-Tub, loading control GAPDH, fold change by Tb and/or AcD.

      (11) Fig 5C, WB, Glu-Tub.

      For qPCRs:

      The reader’s comment asked to display error bars if the variance in controls was considered. The variance in controls was not considered, which is a standard practice in the qPCR assay. In this regard, an example from an eLife paper is cited below (variation not considered in controls):

      Fig 4C from Conti et al., N6-methyladenosine in DNA promotes genome stability, revised v2 Feb 3, 2025.

      Accordingly, the following graphs remain unchanged:

      (3) Fig 1D, QRT PCR, pai1/mmp9, fold change by Tb treatment, reference not disclosed.

      (6) Fig 4A, QRT PCR, fold change by Tb.

      For crystal violet experiments:

      Due to variability in the procedure introduced from CV preparation, uptake, and extraction etc., in the absence of a reference/standard, it is not possible to determine the absolute cell number across experiments. To simplify the calculation, we normalize CV intensity of all the samples to control for an experiment, so the control group doesn’t have error bars. In this regard, an example from an eLife paper is cited below (variation not considered in controls).

      Fig 2H from Brunner et al., PTEN and DNA-PK determine sensitivity and recovery in response to WEE1 inhibition in human breast cancer, version of record July 6, 2020.

      Accordingly, the following graphs remain unchanged:

      (4) Fig 2A, migration, crystal red absorbance.

      (5) Fig 2B, migration, crystal red absorbance.

      Lastly, #8 remains unchanged.

      (8) Fig 4C, intensity, with variation, fine.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      Summary:

      In this study, Fakhar et al. use a game-theoretical framework to model interregional communication in the brain. They perform virtual lesioning using MSA to obtain a representation of the influence each node exerts on every other node, and then compare the optimal influence profiles of nodes across different communication models. Their results indicate that cortical regions within the brain's "rich club" are most influential.

      Strengths:

      Overall, the manuscript is well-written. Illustrative examples help to give the reader intuition for the approach and its implementation in this context. The analyses appear to be rigorously performed and appropriate null models are included.

      Thank you.

      Weaknesses:

      The use of game theory to model brain dynamics relies on the assumption that brain regions are similar to agents optimizing their influence, and implies competition between regions. The model can be neatly formalized, but is there biological evidence that the brain optimizes signaling in this way? This could be explored further. Specifically, it would be beneficial if the authors could clarify what the agents (brain regions) are optimizing for at the level of neurobiology - is there evidence for a relationship between regional influence and metabolic demands? Identifying a neurobiological correlate at the same scale at which the authors are modeling neural dynamics would be most compelling.

      This is a fundamental point, and we put together a new project to address it. The current work focuses on, firstly, rigorously formalizing a prevailing assumption that brain regions optimize communication, and then uncovering what are the characteristics of communication if this optimization is indeed taking place. Based on our findings, we suspect the mechanism of an optimal communication to be through broadcasting (compared to other modes explored in our work, e.g., the shortest-path signalling or diffusion). However, we recognize that our game-theoretical framework does not directly address “how” this mechanism is implemented. Thus, in our follow-up work, we are analyzing available datasets of signal propagation in the brain to see if communication dynamics there match the predictions of the game-theoretical setup. However, following your question, we extended our discussion to cover this point, cited five other works on this topic, and what, we think, could be the neurobiological mechanism of optimal signalling.  

      It is not entirely clear what Figure 6 is meant to contribute to the paper's main findings on communication. The transition to describing this Figure in line 317 is rather abrupt. The authors could more explicitly link these results to earlier analyses to make the rationale for this figure clearer. What motivated the authors' investigation into the persistence of the signal influence across steps?

      Great question. Figure 6 in part follows Figure 5, which summarizes a key aspect of our work: Signals subside at every step but not exponentially (Figure 5), and they nearly fall apart after around 6 steps (Figure 6 A and B). Subplots A and B together suggest that although measures like communicability account for all possible pathways, the network uses a handful instead, presumably to balance signalling robustness versus the energetic cost of signalling. Subplot C, one of our main findings, then shows how one simple model is all needed to predict a large portion of optimal influence compared to other models and variables. In sum, Figure 5 focused on the decay dynamics while Figure 6 focused on the extent, in terms of steps, given that the decay is monotonic. Together, our motivation for this figure was to show how the right assumption about decay rate and dynamics can outperform other measures in predicting optimal communication. 

      The authors used resting-state fMRI data to generate functional connectivity matrices, which they used to inform their model of neural dynamics. If I understand correctly, their functional connectivity matrices represent correlations in neural activity across an entire fMRI scan computed for each individual and then averaged across individuals. This approach seems limited in its ability to capture neural dynamics across time. Modeling time series data or using a sliding window FC approach to capture changes across time might make more sense as a means of informing neural dynamics.

      We agree with you on the fact that static fMRI is limited in capturing neural dynamics. However, we opted not to perform dynamic functional connectivity fitting just yet for a practical reason: Other communication models used here do not fit to any empirical data and provide a static view of the dynamics, comparable to the static functional connectivity. Since one of our goals was to compare different communication regimes, and the fact that fitting dynamics does not seem to substantially change the outcome if the end result is static (Figure 7), we decided to go with the poorer representation of neural data for this work. However, part of our follow-up project involves looking into the dynamics of influence over time and for that, we will fit our models to represent more realistic dynamics.

      The authors evaluated their model using three different structural connectomes: one inferred from diffusion spectrum imaging in humans, one inferred from anterograde tract tracing in mice, and one inferred from retrograde tract-tracing in macaque. While the human connectome is presumably an undirected network, the mouse and macaque connectomes are directed. What bearing does experimentally inferred knowledge of directionality have on the derivation of optimal influence and its interpretation?

      In terms of if directionality changes the interpretation of optimal influence, we think it sets limits for how much we can compare communication dynamics of these two types of networks. We think interpreting optimal communication in directed graphs needs to disentangle incoming influence from outgoing influence, e.g., analyzing “projector hubs/coordinators” and “receiver hubs/integrators” instead of putting both into a common class of hubs. Also, here we showed the extent of which a signal travels before it significantly degrades, having done so in an undirected graph. One of its implications for a directed graph is the possibility that some nodes can be unreachable from others, given the more restricted navigation. A possibility that we did not observe in the human connectome as all nodes could reach others, although with limited influence (see Figure 2. C). We did not explore these differences, as we used mice and macaque connectomes primarily to control for modality-specific confounds of DSI. However, our relatively poorer fit for directed networks (Supplementary Figure 2) motivated us to analyze how reciprocal connections shape dynamics and what impact do they have on networks’ function. Using the same connectomes as the current work, we addressed this question in a separate publication (Hadaeghi et al., 2024) and plan to extend both works by analyzing the signalling properties of directed networks.

      It would be useful if the authors could assess the performance of the model for other datasets. Does the model reflect changes during task engagement or in disease states in which relative nodal influence would be expected to change? The model assumes optimality, but this assumption might be violated in disease states.

      This is a wonderful idea that we initially had in mind for this work as well, but decided to dedicate a separate work on deviations in different tasks states, as well as disease states (mainly neurodegenerative disorders). We noticed the practical challenges of fitting large-scale models to task dynamics and harmonizing neuroimaging datasets of neurodegenerative disorders is beyond the scope of the current work. Unfortunately, this effort, although exciting and promising, is still pending as the corresponding author does not yet have the required expertise of neuroimaging processing pipelines.

      The MSA approach is highly computationally intensive, which the authors touch on in the Discussion section. Would it be feasible to extend this approach to task or disease conditions, which might necessitate modeling multiple states or time points, or could adaptations be made that would make this possible?

      Continuing our response from the previous point, yes, we think, in theory, the framework is applicable to both settings. Currently, our main point of concern is not the computational cost of the framework but the harmonization of the data, to ensure differences in results are not due to differences in preprocessing steps. However, assuming that all is taken care of, we believe a reasonable compute cluster should suffice by parallelizing the analytical pipeline over subjects. We acknowledge that the process would still be time-consuming, but besides the fitting process, we expect a modern high-performance CPU with about 32–64 threads to take up to 3 days analyzing one subject, given 100 brain regions or fewer. This performance then scales with the number of cluster nodes that can each work on one subject. We note that the analytical estimators such as SAR could be used instead, as it largely predicts the results from MSA. The limitations are then the lack of dynamics over time and potential estimation errors.

      Reviewer #2 (Public review):

      Summary:

      The authors provide a compelling method for characterizing communication within brain networks. The study engages important, biologically pertinent, concerns related to the balance of dynamics and structure in assessing the focal points of brain communication. The methods are clear and seem broadly applicable, however further clarity on this front is required.

      Strengths:

      The study is well-developed, providing an overall clear exposition of relevant methods, as well as in-depth validation of the key network structural and dynamical assumptions. The questions and concerns raised in reading the text were always answered in time, with straightforward figures and supplemental materials.

      Thank you.

      Weaknesses:

      The narrative structure of the work at times conflicts with the interpretability. Specifically, in the current draft, the model details are discussed and validated in succession, leading to confusion. Introducing a "base model" and "core datasets" needed for this type of analysis would greatly benefit the interpretability of the manuscript, as well as its impact.

      Following your suggestion, we modified the introduction to emphasize on the human connectome and the linear model as the main toolkit. We also added a paragraph explaining the datasets that can be used instead.

      Recommendations for the authors:

      Essential Revisions (for the authors):

      (1) The method presents an important and well-validated method for linking structural and functional networks, but it was not clear precisely what the necessary data inputs were and what assumptions about the data mattered. To improve the clarity of the presentation for the reader, it would be beneficial to have an early and explicit description of the flow of the method - what exact kinds of datasets are needed and what decisions need to be made to perform the analysis. In addition, there were questions about how the use or interpretation of the method might change with different methods of measuring structure or function, which could be answered via an explicit discussion of the issue. For example, how do undirected fMRI correlation networks compare to directed tracer injection projection networks? Similarly, could this approach apply in cases like EM connectomics with linked functional imaging that do not have full observability in both modalities?

      This is an important point that we missed addressing in detail in the original manuscript. Now we did so, by first adding a paragraph (lines 292-305, page 10) explaining the pipeline and how our framework handles different modeling choices, and then further discussing it in the Discussion (lines 733-748, page 28). Moreover, we adjusted Figure 1, by delineating two main steps of the pipeline. Briefly, we clarified that MSA is model-agnostic, meaning that, in principle, any model of neural dynamics can be used with it, from the most abstract to the most biologically detailed. Moreover, the approach extends to networks built on EM connectomics, tract-tracing, DTI, and other measures of anatomical connectivity. However, we realized that a key detail was not explicitly discussed (pointed to by Reviewer #2), that is, the fact that these models naturally need to be fitted to the empirical dataset, even though this fitting step appears not to be critical, as shown in Figure 7.

      Lines 292-305:

      “The MSA begins by defining a ‘game.’ To derive OSP, this game is formulated as a model of dynamics, such as a network of interacting nodes. These can range from abstract epidemic and excitable models (Garcia et al., 2012; Messé et al., 2015a) to detailed spiking neural networks (Pronold et al., 2023) and to mean-field models of the whole brain dynamics, as chosen here (see below). The model should ideally be fitted to reflect real data dynamics, after which MSA systematically lesions all nodes to derive the OSP. Put together, the framework is general and model-agnostic in the sense that it accommodates a wide range of network models built on different empirical datasets, from human neuroimaging and electrophysiology to invertebrate calcium imaging, and anything in between. In essence, the framework is not bound to specific modelling paradigms, allowing direct comparison among different models (e.g., see section Global Network Topology is More Influential Than Local Node Dynamics).”

      Lines 733-740:

      “As noted in the introduction, OI is model-agnostic, here, we leveraged this liberty to compare signaling under different models of local dynamics, primarily built upon undirected human connectome data. We also considered different modalities, e.g., tract tracing in Macaque (see Structural and Functional Connectomes under Materials and Methods) to confirm that the influence of weak connections is not inflated due to imaging limitations (Supplementary Figure 5. A). The game theoretical formulation of signaling allows for systematic comparison among many combinations of modeling choices and data sources.”

      We then continued with addressing the issue of full observability. We clarified that in this work, full observability was assumed. However, the mathematical foundations of our method capture unobserved contributors/influencers as an extra term, similar to the additive error term of a linear regression model. To keep the paper as non-technical as possible, we omitted expanding the axioms and the proof of how this is achieved, and instead referred to previous papers introducing the framework. 

      Lines 740-748:

      “Nonetheless, in this work, we assumed full observability, i.e., complete empirical knowledge of brain structure and function that is not necessarily practically given. Although a detailed investigation of this issue is needed, mathematical principles behind the method suggest that the framework can isolate the unobserved influences. In these cases, activity of the target node is decomposed such that the influence from the observed sources is precisely mapped, while the unobserved influences form an extra term, capturing anything that is left unaccounted for, see (Algaba et al., 2019b; Fakhar et al., 2024) for more technical details.”

      (2) The value of the normative game theoretic approach was clear, but the neurobiological interpretation was less so. To better interpret the model and understand its range of applicability, it would be useful to have a discussion of the potential neurobiological correlates that were at the same level of resolution as the modeling itself. Would such an optimization still make sense in disease states that might also be of interest?

      This is a brilliant question, which we decided to explore further in separate studies. Specifically, the link between optimal communication and brain disorders is a natural next step that we are pursuing. Here, we expanded our discussion with a few lines first explaining the roots of our main assumption, which is that neurons optimize information flow, among other goals. We then hypothesized that the biological mechanisms by which this goal is achieved include (based on our findings) adopting a broadcasting regime of signaling. We suspect that this mode of communication, operationalized on complex network topologies, is a trade-off between robust signaling and energy efficiency. Currently, we are planning practical steps to test this hypothesis.

      Lines 943-962:

      “Nonetheless, our framework is grounded in game theory where its fundamental assumption is that nodes aim at maximizing their influence over each other, given the existing constraints. This assumption is well explored using various theoretical frameworks (Buehlmann and Deco, 2010; Bullmore and Sporns, 2012; Chklovskii et al., 2002; Laughlin and Sejnowski, 2003; O’Byrne and Jerbi, 2022) and remains open to further empirical investigation. Here, we used game theory to mathematically formalize a theoretical optimum for communication in brain networks. Our findings then provide a possible mechanism for achieving this optimality through broadcasting. Based on our results, we speculate that, there exists an optimal broadcasting strength that balances robustness of the signal with its metabolic cost. This hypothesis is reminiscent of the concept of brain criticality, which suggests the brain to be positioned in a state in which the information propagates maximally and efficiently (O’Byrne and Jerbi, 2022; Safavi et al., 2024). Together, we suggest broadcasting to be the possible mechanism with which communication is optimized in brain networks, however, further research directions include investigating whether signaling within brain networks indeed aligns with a game-theoretic definition of optimality. Additionally, if it does, subsequent studies could then examine how deviations from optimal communication contribute to or result from various brain states or neurological and psychiatric disorders.”

      Reviewer #1 (Recommendations for the authors):

      I would recommend that the authors consider the following point in a revision, as well as the major weaknesses of the public review. Some aspects of Figure 1 could be clearer. What is being illustrated by the looping arrow to MSA? What is being represented in the matrices (labeling "source" and "target" on the matrix might enhance clarity)? Is R2 the metric used to assess the degree of similarity between communication models? These could be addressed by making small additions to the figure legend or to the figure itself.

      Thank you for your constructive comment on Figure 1, which is arguably the most important figure in the manuscript. We adjusted the figure and its caption (see above) based on your suggestions. After doing so, we think the figure is now clearer regarding the pipeline used in this work.

      Reviewer #2 (Recommendations for the authors):

      Overall, as stated in the public review and the short assessment, the manuscript is in a clearly mature state and brings an important method to link the fields of structural and functional brain networks.

      Nevertheless, the paper would benefit from an early, and clear, discussion of the:

      (1) components of the model, and assumptions of each, should be stated at the end of the introduction, or early in results. (2) datasets necessary to run the analysis.

      The confusion arises from lines 130-131, stating "In the present work (summarized in Figure 1), we used the human connectome, large-131 scale models of dynamics, and a game-theoretical perspective of signaling." This, to me, indicated that a structural connectivity map may be the only dataset required, as the dynamics model and game theory component are solely simulated. However, later, lines 214-216 state that the empirical functional connectivity is estimated from the structural connectivity, indicating that the method is only applied to cases where we have both.

      Finally, Supplemental Figure 5 validates a number of metrics on different solely structural networks (which is a very necessary and well-done control). Similarly, while the dynamical model is discussed in depth, and beautifully shown that the specific choice of dynamical model does not directly impact the results, it would be helpful to clarify the dynamical model utilized in the early figures.

      Thank you for pointing out a critical detail that we missed elaborating sufficiently early in the paper: the modelling step. Following your suggestions, we added a paragraph from line 292 to 305 (page 10) expanding on the modelling framework. We also explicitly divided the modelling step in Figure 1 and briefly clarified our modelling choices in the caption. Together, we emphasized the fact that our framework is generally model agnostic, which allows different models of dynamics to be plugged into various anatomical networks. We then clarified that, like in any modelling effort, one needs to first fit/optimize the model parameters to reproduce empirical data. In other words, we emphasized the fact that our framework relies on a computational model as its ‘game’ to infer how regions interact, and we fine-tuned our models to reproduce the empirical FC.

      Again, this is not a critique of the methods, which are excellent, but the presentation. It would help readers, and even me, to have a clear indication of the model earlier. Further, it would help to discuss, both in the introduction and discussion, the datasets required for applying these methods more broadly. For instance, 2-photon recordings are discussed - would it be possible to apply this method then to EM connectomes with functional data recorded for them? In theory, it seems like yes, although the current datasets have 100% observability, whereas 2-photon imaging, or other local methods, will not have perfect overlap between structural and functional connectomes. Discussions like this, related to the assumptions of the model, the necessary datasets, and broader application directions beyond DSI, fMRI, and BOLD cases where the method was validated, would increase the impact and interpretability for a broad readership.

      This is a valid point that we should have been more explicit about. The revised manuscript now contains a paragraph (lines 740-748) clarifying the fact that, throughout this work, we assumed full observability. We then briefly discuss, based on the mathematical principles of the framework, what we expect to happen in cases with partial observability. We then point at two references in which the details of a framework with partial observability are laid out, one containing mathematical proofs and the other using numerical simulations.

      References:

      Hadaeghi, F., Fakhar, K., & Hilgetag, C. C. (2024). Controlling Reciprocity in Binary and Weighted Networks: A Novel Density-Conserving Approach (p. 2024.11.24.625064). bioRxiv. https://doi.org/10.1101/2024.11.24.625064

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper addresses the problem of optimising the mapping of serum antibody responses against a known antigen. It uses the croEM analysis of polyclonal Fabs to antibody genes, with the ultimate aim of getting complete and accurate antibody sequences. The method, commonly termed EMPEM, is becoming increasingly used to understand responses in convalescent sera and optimisation of the workflows and

      The authors do not address the experimental aspects of the methods and do not present novel computational tools, rather they use a series of established computational methods to provide workflows that simplify the interpretation of the EM map in terms of the sequences of dominant antibodies.

      We would like to thank the reviewer for this assessment. While indeed we implement ModelAngelo as published without changes to its algorithms or code, we did add new functionality to Stitch to read the generated output from ModelAngelo and assemble it against known databases of germline-encoded antibody sequences. Of note, ModelAngelo was not primarily developed to determine exact sequence from CryoEM images, but instead to provide input for sequence determination from sequence searches with profile HMMs. Such models are designed to handle ambiguous calls of residues at different positions of a protein sequence. We are of the opinion that one of the main contributions of our study is to finally benchmark the EMPEM approach against known sequences to build a framework for data quality requirements in the future. From our study in best-case scenario’s EM data alone will provide sequences at 80-90% accuracy. In other words, the sequences are riddled with errors and cannot be taken at face value without orthogonal sequencing data. We demonstrate that mass spectrometry data can fill this requirement and yield much improved accuracy of the sequences even against high backgrounds of unrelated antibody sequences. We are incredibly excited about the prospects and future developments for EMPEM and believe that its integration with orthogonal sequencing approaches like MS are critical moving forward. By developing this pipeline we hope to have taken steps in the right direction.

      Strengths:

      The paper is well-written and clearly argued. The tests constructed seem appropriate and fair and demonstrate that the workflow works pretty well. For a small subset (~17%) of the EMPEM maps analysed the workflow was able to get convincing assignments of the V-genes.

      Thanks for the kind assessment.

      Weaknesses:

      The AI methods used are not a substitute for high quality data and at present very few of the results obtained from EMPEM will be of sufficient quality to robustly assign the sequence of the antibody. However, rather more are likely to be good enough, especially in combination with MS data, to provide a pretty good indication of the V-gene family.

      We fully agree with the assessment of the reviewer, as this being a general limitation of the EMPEM field. If anything, we hope our benchmark study and developed pipeline to integrate with MS-based sequencing data have more clearly established the current limitations of the technique and the requirements/prospects for orthogonal sequencing data to fill the missing gaps.

      Reviewer #2 (Public review):

      In this manuscript, the authors seek to demonstrate that it is possible to sequence antibody variable domains from cryoEM reconstructions in combination with bottom-up LC-MSMS. In particular, they extract de novo sequences from single particle-cryo-EM-derived maps of antibodies using the "deep-learning tool ModelAngelo", which are run through the program Stitch to try to select the top scoring V-gene and construct a placeholder sequence for the CDR3 of both the heavy and light chain of the antibody under investigation. These reconstructed variable domains are then used as templates to guide the assembly of de novo peptides from LC-MS/MS data to improve the accuracy of the candidate sequence.

      Using this approach the authors claim to have demonstrated that "cryoEM reconstructions of monoclonal antigen-antibody complexes may contain sufficient information to accurately narrow down candidate V-genes and that this can be integrated with proteomics data to improve the accuracy of candidate sequences".

      WhiIe the approach is clearly a work in progress, the manuscript should made easier to understand for the general reader. Indeed, I had a hard time understanding the workflow until I got to Fig. 3. So re-ordering the figures, for example, may be helpful in this regard.

      It would be useful to provide additional concrete examples where the described workflow would assist in the elucidation of CDR3's, in cases where this isn't already known. (In the benchmark dataset from the Electron Microscopy Data Bank, all the antibodies and Fabs are presumably known, as is the case for the monoclonal antibody CR3022). I am having difficulty envisioning how one would prepare samples from actual plasma samples that would be appropriate for single particle cryo-EM and MS data on dominant antibodies of interest. In my experience, most of these samples tend to be quite complex mixtures. So additional discussion of this point would be helpful.

      We would like to thank the reviewer for their kind and critical assessment of our work. We have adopted the suggestion to reorder the graphical material, such that the workflow schematic is now Figure 1 in the main text. We hope this will improve the readability.

      Regarding the concrete examples where the workflow could aid in elucidating CDR3 sequences, we would like to refer to all published EMPEM studies and in particular those highlighted in Figure 6. We are also actively working to integrate EMPEM data with MS-based sequencing on novel samples, but those will be subject of later studies. We have added additional discussion regarding the experimental feasibility of the approach. We have highlighted several milestone results where functional antibodies were reconstructed from EMPEM and/or MS data. In the discussion we write:

      “While sample complexity remains an important bottleneck, and questions remain about the dynamic range of the true serum antibody repertoire and the depth of coverage from these novel experimental approaches, several studies have recently reached the important milestone of reconstructing functional antibodies from direct measurements of the secreted serum components.” (see references in manuscript)

      “We believe that both EMPEM and MS-based polyclonal antibody sequencing are still limited to the top 1-10 antibodies in the polyclonal mixture. The EMPEM approach is biased towards bigger and well-ordered target antigens, which calls for additional complementary approaches like HDX-MS for a comprehensive polyclonal epitope mapping exercise.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Line 172: I am surprised the heavy chain is not worse than the light chain

      We have added the following sentence:

      “The length of the complete antigen binding loops was estimated with an average error of 0.5 ± 3.3 or 1.7 ± 6.0 residues for heavy and light chain, with average sequence identities of 0.63 and 0.41. While CDRH3 is the more challenging region in MS-based approaches to antibody sequencing, we believe that the moderately better length and sequence accuracy of CDRH3 compared to CDRL3 in ModelAngelo output reflects the CDRH3’s notoriously tight involvement in antigen binding, hence a greater relative stability in the antibody-antigen complex, resulting in better order in the reconstructed EM density maps.”

      Line 175: Global FSC is not going to be useful. Why not use a local value?

      We agree that local resolution estimates would be more appropriate, that is exactly why we added this remark to our initial analysis. However, local resolution estimates are non-trivial and raise the question about ‘how local’ we need to estimate the quality of the map (see for instance https://doi.org/10.1016/j.sbi.2020.06.005). At present, we believe that the required work for this local resolution analysis is not warranted, only to arrive at the rather intuitive if not tautological conclusion that a better map quality translates into more accurate sequences. While we agree that a better quantitative understanding of the data requirements for EMPEM could benefit the field, we opted to leave this, especially considering that the Stitch alignment score is already a good alternative predictor of sequence accuracy compared to map resolution as demonstrated in Figure 3,

      Line 259: 'of the 23 maps' .... Actually there were 46 maps originally, so I feel this is a tad misleading.

      The statistic of ‘46 total’ was added to the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Alternate explanations for major conclusions.

      The major conclusions are (a) surface motility of W3110 requires pili which is not novel, (b) pili synthesis and pili-dependent surface motility require putrescine — 1 mM is optimal, and 4 mM is inhibitory, and (c) the existence of a putrescine homeostatic network that maintains intracellular putrescine that involves compensatory mechanisms for low putrescine, including diversion of energy generation toward putrescine synthesis.

      Conclusion a: Reviewer 3 suggests that the mutant may have lost surface motility because of outer surface structures that actually mediate motility but are co-regulated with or depend on pili synthesis. The reviewer explicitly suggests flagella as the alternate appendage, although flagella and pili are reciprocally regulated. Most experiments were performed in a Δ_fliC_ background, which lacks the major flagella subunit, in order to prevent the generation of fast-moving flagella-dependent variants. Furthermore, no other surface structure that could mediate surface motility is apparent in the electron microscope images. This observation does not definitively rule out this possibility, especially because of the large transcriptomic changes with low putrescine. Our explanation is the simplest.

      Conclusion b, first comment: Reviewer 1 states that “it is not possible to conclude that the effects of gene deletions to biosynthetic, transport or catabolic genes on pili-dependent surface motility are due to changes in putrescine levels unless one takes it on faith that there must be changes to putrescine levels.” The comment ignores both the nutritional supplementation and the transcript changes that strongly suggest compensatory mechanisms for low putrescine. Why compensate if the putrescine concentration does not change? The reviewer then implicitly acknowledges changes in putrescine content: “it is important to know how much putrescine must be depleted in order to exert a physiological effect”.

      Conclusion b, second comment: Reviewer 1 proposes that agmatine accumulation can account for some of the observed properties, but which property is not specified. With respect to motility, agmatine accumulation cannot account for motility defects because motility is impaired in (a) a speA mutant which cannot make agmatine and (b) a speC speF double mutant which should not accumulate agmatine. With respect to the transcriptomic results, even if high agmatine is the reason for some transcript changes, the results still suggest a putrescine homeostasis network.

      Conclusion c: the reviewers made no comments on the RNAseq analysis or the interpretation of the existence of a homeostatic network.

      Additional experiments proposed.

      Complementation. Reviewers 1 and 3 suggested complementation experiments, but the latter states that nutritional supplementation strengthens our arguments. The most relevant complementation is with speB.  We tried complementation and found that our control plasmid inhibited motility by increasing the lag time before movement commenced. A plasmid with speB did stimulate motility relative to the control plasmid, but movement with the speB plasmid took 4 days, while wild-type movement took 1.5 days. We think that interpretation of this result is ambiguous. We did not systematically search for plasmids that had no effect on motility.

      The purpose of complementation is to determine whether a second-site mutation is the actual cause of the motility defect. In this case, the artifact is that an alteration in polyamine metabolism is not the cause of the defect. However, external putrescine reverses the effects on motility and pili synthesis in the speB mutant. This result is inconsistent with a second-site mutation. Still, we agree that complementation is important, and because of our difficulties, we tested numerous mutants with defects in polyamine metabolism. The results present an interpretable and coherent pattern. For example, if putrescine is not the regulator, then mutants in putrescine transport and catabolism should have had no effect. Every single mutant is consistent with a role in movement and pili synthesis. The simplest explanation is that putrescine affects movement and pili synthesis.

      Phase variation. Reviewer 2 noted that we did not discuss phase variation. The comment came from the observation that the speB mutant had fewer fimB transcripts which could explain the loss of motility. The reviewer also suggested a simple experiment, which we performed and found that putrescine does not control phase variation. We present those results in the supplemental material. Our discussion of this topic includes a major qualification.

      Testing of additional strains. Published results from another lab showed that surface motility of MG1655 requires spermidine instead of putrescine (PMID 19493013 and 21266585). MG1655 and the W3110 that we used in our study are E. coli K-12 derivatives and phylogenetic group A. Any number of changes in enzymes that affect intracellular putrescine concentration could result in different responses to putrescine. We are currently studying pili synthesis and motility in other strains. While that study is incomplete, loss of speB in a strain of phylogenetic group D eliminates no surface motility. This work was intended as our initial analysis and the focus was on a single strain.

      Measuring intracellular polyamines. We felt that we had provided sufficient evidence to conclude that putrescine controls pili synthesis and putrescine concentrations are lower in the speB mutant: the nutritional supplementation, the lower levels of transcripts for putrescine catabolic enzymes which require putrescine for their expression strongly suggest lower putrescine in a mutant lacking a putrescine biosynthesis gene, and a transcriptomic analysis that found the speB mutant had transcript changes to compensate for low putrescine. We understand the importance of measuring intracellular polyamines. We are currently examining the quantitative relationship between intracellular polyamines and pili synthesis in multiple strains which respond differently to loss of speB.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors should measure putrescine, agmatine, cadaverine, and spermidine levels in their gene deletion strains.

      Polyamine concentration measurements will be part of a separate study on polyamine control of pili synthesis of a uropathogenic strain. A comparison is essential, and the results from W3110 will be part of that study.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 28. Your statements about urinary tract infections are pure speculation. They are fine for the discussion, but should not be in the abstract.

      The abstract from line 27 on has been reworked. The comment of the reviewer is fair.

      (2) Line 65. Do we need this discussion about the various strains? If you keep it, you should point out that they were all W3110 strains. But you could just say that you confirmed that your background strain can do PDSM (since you are also not showing any data for the other isolates). Discussing the various strains implies that you are not confident in your strain and raises the question of why you didn't use a sequenced wt MG1655, or something like that.

      This section has been reworked. Our strain of W3110 has an insertion in fimB which is relevant for movement but does not affect our results. The insertion limits our conclusions about phase variation. We want to point out that strains variations are large. We also sequenced our strain of W3110.

      (3) Related. You occasionally use "W3110-LR" to designate the wild type. You use this or not, but be consistent throughout the text.

      Fixed

      (4) Line 99. Does eLife allow "data not shown"?  

      (5) Line 119. As you note, the phenotype of the puuA patA double mutant is exactly the opposite of what one would expect. Although you provide additional evidence that high levels also inhibit motility, complementing the double mutant would provide confidence that the strain is correct.

      We rapidly ran into issues with complementation which are discussed in public responses to reviewer comments.

      (6) Figure 6C. Either you need to quantify these data or you need a better picture.

      The files were corrupted. It was repeated several time, but we lost the other data.

      (7) Figure 7. Label panels A and B to indicate that these strains are speB. Also, you need to switch panels C and D to match the order of discussion in the manuscript.

      Done

      (8) Line 134. Is there a statistically significant difference in the ELISA between 1 and 4 mM? You need to say one way or the other.

      No statistical significance and this has been added to the paper

      (9) Figure 10C. You need to quantify these data.

      Quantification added as an extra panel.

      (10) Line 164. You include H-NS in the group of "positive effectors that control fim operon expression" and you reference Ecocyc, rather than any primary reference. Nowhere in the manuscript do you mention phase variation. In the speB mutant, you see decreased fimB, increased fimE, and decreased hns expression. My interpretation of the literature suggests that this would drive the fim switch to the off-state. This could certainly explain some of the results. It is also easily measurable with PCR. This might require testing cells scraped directly from the plates.

      The experiments were performed. There is no need to scrap cells from plates because the fimB result from RNAseq was from a liquid culture, and the prediction would be that the phase-locking should be evident in these cells.

      (11) Figure 10. Likewise, do you know that your hns mutant is not locked in the off-state? Granted, the original hns mutants (pilG) showed increased rates of switching, but growth conditions might matter.

      We also did phase variation for the hns mutant and the hns mutant was not phase locked. This result is shown. In addition to growth conditions, the strain probably matters.

      (12) Line 342. You describe the total genome sequencing of W3110, yet this is not mentioned anywhere else in the manuscript.

      It is now

      Minor points:

      (13) Line 192. "One of the most differentially expressed genes...".

      (14) Line 202. "...implicates extracellular putrescine in putrescine homeostasis."

      (15) Line 209. "...potential pili regulators...".

      (16) You are using a variety of fonts on the figures. Pick one.

      (17) Figure 9A. It took me a few minutes to figure out the labeling for this figure and I was more confused after reading the legend. It would be simpler to independently label red triangles, blue triangles, red circles, and blue circles.

      (18) Figure 9B and 10. The reader can likely figure out what W3110_1.0_3 means, but more straightforward labeling would be better, or you need to define these labels.

      All points were addressed and fixed.

      Reviewer #3 (Recommendations for the authors):

      Other comments:

      (1) Please go through the figures and the reference to figures in the text, as they often do not refer to the right panel (ex: figures 2 and 7 for instance). In the text, please homogenize the reference to figures (Figure 2C vs Figure 3). To help compare motility experiments between figures, please use the same scale in all figures.

      This has been fixed.

      (2) Lines 65-70: I am not sure I get the reason behind choosing the W3110 strain from your lab stock. In what background were the initial mutants constructed (from l.64-65)? Were the nine strains tested, all variations of W3110? If so, is the phenotype described in the manuscript robust in all strains?

      We have provided more explanation. W3110 was the most stable: insertions that allowed flagella synthesis in the presence of glucose were frequent. We deleted the major flagella subunit for most experiments. Before introduction of the fliC deletion, we needed to perform experiments 10 times so that fast-moving variants, which had mutationally altered flagella synthesis, did not complicate results.

      (3) Line 82-84: As stated in the public review, I think more controls are needed before making this conclusion, especially as type I fimbriae are usually involved in sessile phenotypes.

      Response provided in the public response.

      (4) In Figure 3: Changing the order of the image to follow the text would make the figure easier to follow.

      Fixed as requested

      (5) Lines 100-101: simultaneous - the results presented here do not support this conclusion. In Figure 4b, the addition of putrescine to speB mutants is actually not different from WT. From the results, it seems like one of biosynthesis or transport is needed, but it's not clear if both are needed simultaneously. For this, a mutant with no biosynthesis and no transport is needed and/or completely non-motile mutants would be needed to compare.

      We disagree. If there are two pathways of putrescine synthesis and both are needed, then our conclusion follows.

      (6) Lines 104-105: '... because E. coli secretes putrescine.' - not sure why this statement is there, as most transporters tested after are importers of putrescine? It is also not clear to me if putrescine is supplemented in the media in these experiments. If not, is there putrescine in the GT media?

      Good points, and this section has been reworded to clarify these issues. Some of the material was moved to the discussion.

      (7) Line 109: 'We note that potE and plaP are more highly expressed than potE and puuP...' - first potE should be potF?

      This has been corrected.

      (8) Figure 8: What is the difference between the TEM images in Figure 1 and here? The WT in Figure 1 does show pili without the supplementation unless I'm missing something here. Please specify.

      The reviewer means Figure 2 and not Figure 1. Figure 2 shows a wild-type strain which has both putrescine anabolic pathways while Figure 8 is the ΔspeB strain which lacks one pathway.

      (9) Line160-162: Transcripts for the putrescine-responsive puuAP and puuDRCBE operons, which specify genes of the major putrescine catabolic pathway, were reduced from 1.6- to 14- fold (FDR {less than or equal to} 0.02) in the speB mutant (Supplemental Table 1), which implies lower intracellular putrescine. I might not get exactly the point here. If the catabolic pathways are repressed in the speB mutant, then there will be less degradation which means more putrescine!?

      Expression of these genes is a function of intracellular putrescine: higher expression means more putrescine. Any discussion of steady putrescine must include the anabolic pathways: the catabolic pathways do not determine the intracellular putrescine, they are a reflection of intracellular putrescine.

      (10) Lines 162-163: Deletion of speB reduced transcripts for genes of the fimA operon and fimE, but not of fimB. It seems that the results suggest the opposite a reduction of fimB but not fimE!?

      The reviewer is correct, and it is our mistake, and the text now states what is in the figure..

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This manuscript presents an interesting exploration of the potential activation mechanisms of DLK following axonal injury. While the experiments are beautifully conducted and the data are solid, I feel that there is insufficient evidence to fully support the conclusions made by the authors.

      In this manuscript, the authors exclusively use the puc-lacZ reporter to determine the activation of DLK. This reporter has been shown to be induced when DLK is activated.

      However, there is insufficient evidence to confirm that the absence of reporter activation necessarily indicates that DLK is inactive. As with many MAP kinase pathways, the DLK pathway can be locally or globally activated in neurons, and the level of DLK activation may depend on the strength of the stimulation. This reporter might only reflect strong DLK activation and may not be turned on if DLK is weakly activated. The results presented in this manuscript support this interpretation. Strong stimulation, such as axotomy of all synaptic branches, caused robust DLK activation, as indicated by puc-lacZ expression. In contrast, weak stimulation, such as axotomy of some synaptic branches, resulted in weaker DLK activation, which did not induce the puc-lacZ reporter. This suggests that the strength of DLK activation depends on the severity of the injury rather than the presence of intact synapses. Given that this is a central conclusion of the study, it may be worthwhile to confirm this further. Alternatively, the authors may consider refining their conclusion to better align with the evidence presented.

      In Figure 1E we have replotted the puc-lacZ data to show comparisons between different injuries that leave different numbers of spared (or lost) boutons and branches.  We observed no differences between injuries that remove only a small fraction of boutons (injury location (a)) and injuries that remove nearly all of them (injury locations (b) and (c)) and uninjured neurons (Figure 1E). These observations argue against the interpretation that the strength of DLK activation (at least within the cell body) depends on the severity of injury. Rather, puc-lacZ induction appears to be bimodal. It is either induced (in various injuries that remove all synaptic boutons), or not induced, including in injuries that spared only a small fraction of the total boutons. We therefore think that the presence of a remaining synaptic connection rather than the extent of the injury per se is a major determinant of whether the cell body component of Wnd signaling can be activated. 

      The reviewer (and others) fairly point out that our current study focuses on puc-lacZ as a reporter of Wnd signaling in the cell body. We consider this to be a downstream integration of events in axons that are more challenging to detect. It is striking that this integration appears strongly sensitized to the presence of spared synaptic boutons. Examination of Wnd’s activation in axons and synapses is a goal for our future work.

      As noted by the authors, DLK has been implicated in both axon regeneration and degeneration. Following axotomy, DLK activation can lead to the degeneration of distal axons, where synapses are located. This raises an important question: how is DLK activated in distal axons? The authors might consider discussing the significance of this "synapse connection-dependent" DLK activation in the broader context of DLK function and activation mechanisms.

      While it has been noted that inhibition of DLK can mildly delay Wallerian degeneration (Miller et al., 2009), this does not appear to be the case for retinal ganglion cell axons following optic nerve crush (Fernandes et al., 2014). It is also not the case for Drosophila motoneurons and NMJ terminals following peripheral nerve injury (Xiong et al., 2012; Xiong and Collins, 2012). Instead, overexpression of Wnd or activation of Wnd by a conditioning injury leads to an opposite phenotype - an increase in resiliency to Wallerian degeneration for axons that have been previously injured (Xiong et al., 2012; Xiong and Collins, 2012). The downstream outcome of Wnd activation is highly dependent on the context; it may be an integration of the outcomes of local Wnd/DLK activation in axons with downstream consequences of nuclear/cell body signaling.  The current study suggests some rules for the cell body signaling, however, how Wnd is regulated at synapses and why it promotes degeneration in some circumstances but not others are important future questions.

      For the reviewer’s suggestion, it is interesting to consider DLK’s potential contributions to the loss of NMJ synapses in a mouse model of ALS (Le Pichon et al., 2017; Wlaschin et al., 2023). Our findings suggest that the synaptic terminal is an important locus of DLK regulation, while dysfunction of NMJ terminals is an important feature of the ‘dying back’ hypothesis of disease etiology (Dadon-Nachum et al., 2011; Verma et al., 2022). We propose that the regulation of DLK at synaptic terminals is an important area for future study, and may reveal how DLK might be modulated to curtail disease progression. Of note, DLK inhibitors are in clinical trials (Katz et al., 2022; Le et al., 2023; Siu et al., 2018), but at least some have been paused due to safety concerns (Katz et al., 2022). Further understanding of the mechanisms that regulate DLK are needed to understand whether and how DLK and its downstream signaling can be tuned for therapeutic benefit.

      Reviewer #2 (Public review):

      Summary:

      The authors study a panel of sparsely labeled neuronal lines in Drosophila that each form multiple synapses. Critically, each axonal branch can be injured without affecting the others, allowing the authors to differentiate between injuries that affect all axonal branches versus those that do not, creating spared branches. Axonal injuries are known to cause Wnd (mammalian DLK)-dependent retrograde signals to the cell body, culminating in a transcriptional response. This work identifies a fascinating new phenomenon that this injury response is not all-or-none. If even a single branch remains uninjured, the injury signal is not activated in the cell body. The authors rule out that this could be due to changes in the abundance of Wnd (perhaps if incrementally activated at each injured branch) by Wnd, Hiw's known negative regulator. Thus there is both a yet-undiscovered mechanism to regulate Wnd signaling, and more broadly a mechanism by which the neuron can integrate the degree of injury it has sustained. It will now be important to tease apart the mechanism(s) of this fascinating phenomenon. But even absent a clear mechanism, this is a new biology that will inform the interpretation of injury signaling studies across species.

      Strengths:

      (1) A conceptually beautiful series of experiments that reveal a fascinating new phenomenon is described, with clear implications (as the authors discuss in their Discussion) for injury signaling in mammals.

      (2) Suggests a new mode of Wnd regulation, independent of Hiw.

      Weaknesses:

      (1) The use of a somatic transcriptional reporter for Wnd activity is powerful, however, the reporter indicates whether the transcriptional response was activated, not whether the injury signal was received. It remains possible that Wnd is still activated in the case of a spared branch, but that this activation is either local within the axons (impossible to determine in the absence of a local reporter) or that the retrograde signal was indeed generated but it was somehow insufficient to activate transcription when it entered the cell body. This is more of a mechanistic detail and should not detract from the overall importance of the study

      We agree. The puc-lacZ reporter tells us about signaling in the cell body, but whether and how Wnd is regulated in axons and synaptic branches, which we think occurs upstream of the cell body response, remains to be addressed in future studies.

      (2) That the protective effect of a spared branch is independent of Hiw, the known negative regulator of Wnd, is fascinating. But this leaves open a key question: what is the signal?

      This is indeed an important future question, and would still be a question even if Hiw were part of the protective mechanism by the spared synaptic branch. Our current hypothesis (outlined in Figure 4) is that regulation of Wnd is tied to the retrograde trafficking of a signaling organelle in axons. The Hiw-independent regulation complements other observations in the literature that multiple pathways regulate Wnd/DLK (Collins et al., 2006; Feoktistov and Herman, 2016; Klinedinst et al., 2013; Li et al., 2017; Russo and DiAntonio, 2019; Valakh et al., 2013). It is logical for this critical stress response pathway to have multiple modes of regulation that may act in parallel to tune and restrain its activation. 

      Reviewer #3 (Public review):

      Summary:

      This manuscript seeks to understand how nerve injury-induced signaling to the nucleus is influenced, and it establishes a new location where these principles can be studied. By identifying and mapping specific bifurcated neuronal innervations in the Drosophila larvae, and using laser axotomy to localize the injury, the authors find that sparing a branch of a complex muscular innervation is enough to impair Wallenda-puc (analogous to DLK-JNKcJun) signaling that is known to promote regeneration. It is only when all connections to the target are disconnected that cJun-transcriptional activation occurs.

      Overall, this is a thorough and well-performed investigation of the mechanism of sparedbranch influence on axon injury signaling. The findings on control of wnd are important because this is a very widely used injury signaling pathway across species and injury models. The authors present detailed and carefully executed experiments to support their conclusions. Their effort to identify the control mechanism is admirable and will be of aid to the field as they continue to try to understand how to promote better regeneration of axons.

      Strengths:

      The paper does a very comprehensive job of investigating this phenomenon at multiple locations and through both pinpoint laser injury as well as larger crush models. They identify a non-hiw based restraint mechanism of the wnd-puc signaling axis that presumably originates from the spared terminal. They also present a large list of tests they performed to identify the actual restraint mechanism from the spared branch, which has ruled out many of the most likely explanations. This is an extremely important set of information to report, to guide future investigators in this and other model organisms on mechanisms by which regeneration signaling is controlled (or not).

      Weaknesses:

      The weakest data presented by this manuscript is the study of the actual amounts of Wallenda protein in the axon. The authors argue that increased Wnd protein is being anterogradely delivered from the soma, but no support for this is given. Whether this change is due to transcription/translation, protein stability, transport, or other means is not investigated in this work. However, because this point is not central to the arguments in the paper, it is only a minor critique.

      We agree and are glad that the reviewer considers this a minor critique; this is an area for future study. In Supplemental Figure 1 we present differences in the levels of an ectopically expressed GFP-Wnd-kinase-dead transgene, which is strikingly increased in axons that have received a full but not partial axotomy. We suspect this accumulation occurs downstream of the cell body response because of the timing. We observed the accumulations after 24 hours (Figure S1F) but not at early (1-4 hour) time points following axotomy (data not shown). Further study of the local regulation of Wnd protein and its kinase activity in axons is an important future direction.

      As far as the scope of impact: because the conclusions of the paper are focused on a single (albeit well-validated) reporter in different types of motor neurons, it is hard to determine whether the mechanism of spared branch inhibition of regeneration requires wnd-puc (DLK/cJun) signaling in all contexts (for example, sensory axons or interneurons). Is the nerve-muscle connection the rule or the exception in terms of regeneration program activation?

      DLK signaling is strongly activated in DRG sensory neurons following peripheral nerve injury (Shin et al., 2012), despite the fact that sensory neurons have bifurcated axons and their projections in the dorsal spinal cord are not directly damaged by injuries to the peripheral nerve. Therefore it is unlikely that protection by a spared synapse is a universal rule for all neuron types. However the molecular mechanisms that underlie this regulation may indeed be shared across different types of neurons but utilized in different ways. For instance, nerve growth factor withdrawal can lead to activation of DLK (Ghosh et al., 2011), however neurotrophins and their receptors are regulated and implemented differently in different cell types. We suspect that the restraint of Wnd signaling by the spared synaptic branch shares a common underlying mechanism with the restraint of DLK signaling by neurotrophin signaling. Further elucidation of the molecular mechanism is an important next step towards addressing this question. 

      Because changes in puc-lacZ intensity are the major readout, it would be helpful to better explain the significance of the amount of puc-lacZ in the nucleus with respect to the activation of regeneration. Is it known that scaling up the amount of puc-lacZ transcription scales functional responses (regeneration or others)? The alternative would be that only a small amount of puc-lacZ is sufficient to efficiently induce relevant pathways (threshold response).

      While induction of puc-lacZ expression correlates with Wnd-mediated phenotypes, including sprouting of injured axons (Xiong et al., 2010), protection from Wallerian degeneration (Xiong et al., 2012; Xiong and Collins, 2012) and synaptic overgrowth (Collins et al., 2006), we have not observed any correlation between the degree of puc-lacZ induction (eg modest, medium or high) and the phenotypic outcomes (sprouting, overgrowth, etc). Rather, there appears to be a striking all-or-none difference in whether puc-lacZ is induced or not induced. There may indeed be a threshold that can be restrained through multiple mechanisms. We posit in figure 4 that restraint may take place in the cell body, where it can be influenced by the spared bifurcation. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      This is a beautiful study. Naturally, you're searching now for the underlying mechanism.

      A few questions:

      (1) At present you can not determine if the Wnd signal is never initiated (when a spared branch is present) or if it gets to the cell body but is incapable of activating the puckered reporter. Is there any optical reporter (JNK activation?) that could differentiate this?

      The reviewer is correct that a tool to detect local activity of JNK kinase in axons would be ideal for probing the mechanisms that underlie our observations. A FRET reporter for JNK kinase activity has been developed and utilized in cultured cells (Fosbrink et al. 2010). It would be interesting to implement this reporter in Drosophila; it would need to be sensitive enough to visualize  in single Drosophila axons. We have previously noted Wnd-dependent phosphorylated JNK in the cell body of injured motoneurons following nerve crush (Xiong et al., 2010). However anti-pJNK antibodies detect what appears to be a constitutive signal in uninjured axons that does not appear to be influenced by activation or inhibition of Wnd (Xiong et al., 2010).

      (2) What happens when you injure the axon in a dSarm KO? This is more of a curiosity, not a necessity, but is it the axon dying or the detection of the injury itself?

      We have tested whether overexpression of Nmnat or the WldS transgene, which inhibit Wallerian degeneration of injured axons, affect the induction of puc-lacZ following nerve injury. This manipulation has no effect on puc-lacZ expression in uninjured animals, and also has no effect on the induction of puc-lacZ following peripheral nerve crush (TJ Waller, personal communication).

      (3) Are Wnd rescue experiments possible in this context? Would be an interesting place to do Wnd structure-function and compare it to the synaptic work.

      This is not possible with current reagents. Expression of wild type wnd cDNA under the Gal4/UAS promoter leads to strong induction of puc-lacZ in uninjured animals, even when weak Gal4 driver lines are used (Xiong et al., 2012, 2010). Similar observations of constitutively active signaling have been observed for expression studies of DLK in mammalian cells ((Hao et al., 2016; Huntwork-Rodriguez et al., 2013; Nihalani et al., 2000), and data not shown). These and other observations suggest that the levels of Wnd/DLK protein are tightly controlled by posttranscriptional mechanisms. Delineation of sequences within Wnd/DLK that are required for its regulation would be helpful for addressing this question.

      This will be required reading in my lab.

      That is an honor. We look forward to help from the field to understand how and why this pathway is restrained at synapses. Your students may bring new ideas to the table.

      Reviewer #3 (Recommendations for the authors):

      Piezo is spelled incorrectly in the supplemental table in multiple places.

      Thank you for pointing this out! We have made the correction.

      References cited (in rebuttal)

      Collins CA, Wairkar YP, Johnson SL, DiAntonio A. 2006. Highwire restrains synaptic growth by attenuating a MAP kinase signal. Neuron 51:57–69.

      Dadon-Nachum M, Melamed E, Offen D. 2011. The “dying-back” phenomenon of motor neurons in ALS. J Mol Neurosci 43:470–477.

      Feoktistov AI, Herman TG. 2016. Wallenda/DLK protein levels are temporally downregulated by Tramtrack69 to allow R7 growth cones to become stationary boutons. Development 143:2983–2993.

      Fernandes KA, Harder JM, John SW, Shrager P, Libby RT. 2014. DLK-dependent signaling is important for somal but not axonal degeneration of retinal ganglion cells following axonal injury. Neurobiol Dis 69:108–116.

      Ghosh AS, Wang B, Pozniak CD, Chen M, Watts RJ, Lewcock JW. 2011. DLK induces developmental neuronal degeneration via selective regulation of proapoptotic JNK activity. J Cell Biol 194:751–764.

      Hao Y, Frey E, Yoon C, Wong H, Nestorovski D, Holzman LB, Giger RJ, DiAntonio A, Collins C. 2016. An evolutionarily conserved mechanism for cAMP elicited axonal regeneration involves direct activation of the dual leucine zipper kinase DLK. Elife 5. doi:10.7554/eLife.14048

      Huntwork-Rodriguez S, Wang B, Watkins T, Ghosh AS, Pozniak CD, Bustos D, Newton K, Kirkpatrick DS, Lewcock JW. 2013. JNK-mediated phosphorylation of DLK suppresses its ubiquitination to promote neuronal apoptosis. J Cell Biol 202:747–763.

      Katz JS, Rothstein JD, Cudkowicz ME, Genge A, Oskarsson B, Hains AB, Chen C, Galanter J, Burgess BL, Cho W, Kerchner GA, Yeh FL, Ghosh AS, Cheeti S, Brooks L, Honigberg L, Couch JA, Rothenberg ME, Brunstein F, Sharma KR, van den Berg L, Berry JD, Glass JD. 2022. A Phase 1 study of GDC-0134, a dual leucine zipper kinase inhibitor, in ALS. Ann Clin Transl Neurol 9:50–66.

      Klinedinst S, Wang X, Xiong X, Haenfler JM, Collins CA. 2013. Independent pathways downstream of the Wnd/DLK MAPKKK regulate synaptic structure, axonal transport, and injury signaling. J Neurosci 33:12764–12778.

      Le K, Soth MJ, Cross JB, Liu G, Ray WJ, Ma J, Goodwani SG, Acton PJ, Buggia-Prevot V, Akkermans O, Barker J, Conner ML, Jiang Y, Liu Z, McEwan P, Warner-Schmidt J, Xu A, Zebisch M, Heijnen CJ, Abrahams B, Jones P. 2023. Discovery of IACS-52825, a potent and selective DLK inhibitor for treatment of chemotherapy-induced peripheral neuropathy. J Med Chem 66:9954–9971.

      Le Pichon CE, Meilandt WJ, Dominguez S, Solanoy H, Lin H, Ngu H, Gogineni A, Sengupta Ghosh A, Jiang Z, Lee S-H, Maloney J, Gandham VD, Pozniak CD, Wang B, Lee S, Siu M, Patel S, Modrusan Z, Liu X, Rudhard Y, Baca M, Gustafson A, Kaminker J, Carano RAD, Huang EJ, Foreman O, Weimer R, Scearce-Levie K, Lewcock JW. 2017. Loss of dual leucine zipper kinase signaling is protective in animal models of neurodegenerative disease. Sci Transl Med 9. doi:10.1126/scitranslmed.aag0394

      Li J, Zhang YV, Asghari Adib E, Stanchev DT, Xiong X, Klinedinst S, Soppina P, Jahn TR, Hume RI, Rasse TM, Collins CA. 2017. Restraint of presynaptic protein levels by Wnd/DLK signaling mediates synaptic defects associated with the kinesin-3 motor Unc-104. Elife 6. doi:10.7554/eLife.24271

      Miller BR, Press C, Daniels RW, Sasaki Y, Milbrandt J, DiAntonio A. 2009. A dual leucine kinase-dependent axon self-destruction program promotes Wallerian degeneration. Nat Neurosci 12:387–389.

      Nihalani D, Merritt S, Holzman LB. 2000. Identification of structural and functional domains in mixed lineage kinase dual leucine zipper-bearing kinase required for complex formation and stress-activated protein kinase activation. J Biol Chem 275:7273–7279.

      Russo A, DiAntonio A. 2019. Wnd/DLK is a critical target of FMRP responsible for neurodevelopmental and behavior defects in the Drosophila model of fragile X syndrome. Cell Rep 28:2581–2593.e5.

      Shin JE, Cho Y, Beirowski B, Milbrandt J, Cavalli V, DiAntonio A. 2012. Dual leucine zipper kinase is required for retrograde injury signaling and axonal regeneration. Neuron 74:1015– 1022.

      Siu M, Sengupta Ghosh A, Lewcock JW. 2018. Dual Leucine Zipper Kinase Inhibitors for the Treatment of Neurodegeneration. J Med Chem 61:8078–8087.

      Valakh V, Walker LJ, Skeath JB, DiAntonio A. 2013. Loss of the spectraplakin short stop activates the DLK injury response pathway in Drosophila. J Neurosci 33:17863–17873.

      Verma S, Khurana S, Vats A, Sahu B, Ganguly NK, Chakraborti P, Gourie-Devi M, Taneja V. 2022. Neuromuscular junction dysfunction in amyotrophic lateral sclerosis. Mol Neurobiol 59:1502–1527.

      Wlaschin JJ, Donahue C, Gluski J, Osborne JF, Ramos LM, Silberberg H, Le Pichon CE. 2023. Promoting regeneration while blocking cell death preserves motor neuron function in a model of ALS. Brain 146:2016–2028.

      Xiong X, Collins CA. 2012. A conditioning lesion protects axons from degeneration via the Wallenda/DLK MAP kinase signaling cascade. J Neurosci 32:610–615.

      Xiong X, Hao Y, Sun K, Li J, Li X, Mishra B, Soppina P, Wu C, Hume RI, Collins CA. 2012. The Highwire ubiquitin ligase promotes axonal degeneration by tuning levels of Nmnat protein. PLoS Biol 10:e1001440.

      Xiong X, Wang X, Ewanek R, Bhat P, Diantonio A, Collins CA. 2010. Protein turnover of the Wallenda/DLK kinase regulates a retrograde response to axonal injury. J Cell Biol 191:211– 223.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      We thank the three reviewers for the constructive suggestions made in the Public Reviews and the Recommendations to Authors. We have now addressed these comments in a revised manuscript as follows:

      (1) We will revise the text according to the reviewer suggestions and provide more detailed explanations in results and discussion.

      (2) We have uploaded higher resolution images of several figures (resolution had been reduced to achieve lower file sizes) to address the comment regarding “data quality”.

      (3) We have included additional data on eCLIP control experiments in the supplementary figures.

      (4) We have performed additional replications of the western blot analysis for Rbm20 knock-out animals and provided the data in a new Figure.

      Recommendations for the authors:

      Reviewer #1:

      (1) The study is missing CLIP-seq data from control mice that do not express HA, or HA-knocked into a safe-harbor locus. This is important because there is plenty of background HA staining in Figure S2B, in wild-type mice. Including this control would allow subsequent peak calling to distinguish between non-specific HA peaks and RBM20 specific peaks.

      The biochemical conditions used in immunostaining are much less stringent than the buffers employed for immunoprecipitation in the eCLIP protocol. Thus, background staining is not a an informative reference to assess specificity of CLIP isolations. In previous experiments, we confirmed very low background with the anti-HA antibodies in our eCLIP protocol. In the present study, we used a “no-crosslinking control” where samples were not irradiated with UV light. This negative control is now included in Supplementary Figure 4.

      (2) The GO analysis performed to infer synapse-gene specific regulation would be more useful if the authors would discuss specific genes that are represented within these terms and have been shown to be associated with neuronal function.

      We have now noted several synapse-related genes identified in the text.

      (3) Some figures would benefit from larger size and higher resolution including Fig S1, S3.

      We had previously embedded Figures as png files in the text document. In the revised version we uploaded the figures in higher resolution as individual jpeg files. Moreover, we now split Figure S1 into two separate supplementary figures (new Fig.S2) which allowed for enlarging the size of panels. We further enlarged the panels of (former) Fig.S3 (now Fig.S4).

      (4) RBP genes in Figure 1A x-axis are all lowercase. This is not standard mouse gene nomenclature.

      We corrected this.

      (5) Typo in Figure S4F rightmost panel y-axis - 'Length' is misspelled.

      We corrected this.

      Reviewer #2:

      Minor points:

      - Shortly explain DESEQ2 (p4)

      We now added a brief note and corresponding reference in the main text of the manuscript.

      - Is RBM20 a shuttling protein? Any detection in the cytoplasm?

      Our immunostainings for the endogenous RBM20 in heart and olfactory bulb cells suggest that the vast majority of wild-type RBM20 is localized to the nucleus. Previous work on RBM20 disease mutants suggest that pathological forms can accumulate in the cytoplasm. However, with the sensitivity of our detection we did not obtain evidence for a significant cytoplasmic pool in neurons. This does not exclude the possibility that the protein is shuttling – but assessing this would require different types of experiments.

      Reviewer #3:

      (1) Figure 1C: It is shown that some of the RBM20 staining do not colocalize with PV. This observation requires further explanation and discussion to clarify the significance.

      As seen in the fluorescent in situ hybridizations as well as the RiboTRap purifications (Fig.S1C,D), we observe mRNA RBM20 expression not only in parvalbumin-positive interneurons but also somatostatin-positive cells of the neocortex. Accordingly, some RBM20-positive cells do not express parvalbumin. We now clarified this in the text.

      Additionally, in Figure S1C, the resolution of the image is low, making it difficult to conclusively determine whether RBM20 RNA is localized in the nucleus. A high-resolution image would be beneficial to address this ambiguity.

      The Rbm20 mRNA is localized in the nucleus and cytoplasm. We have now split Figure S1 into two separate figures to enlarge the panels for S1C and make this more visible. Moreover, we uploaded higher resolution figure files.

      (2) Figure 1E: The molecular weight of RBM20 is approximately 135 kDa, yet there is a band near 135 kDa in the KO heart. How do the authors determine that the 150 kDa band represents RBM20 rather than the 135 kDa band? The authors may consider increasing the sample size to confirm whether the smaller band consistently appears across all KO heart tissues.

      We appreciate that in this higher molecular weight range, the indicated weight markers may not be entirely accurate. We used a validated knock-out mouse line to identify the appropriate RBM20 protein band. As the 150kDa band was reproducibly lost in the knock-out tissue in the brain and the heart tissue whereas the fainter band of lower mobility remained we concluded that on our gel system RBM20 protein has an apparent molecular weight of 150 kDa. This is further supported by the fact that also the endogenously tagged RBM20 protein has a similar mobility.

      As suggested by the reviewer, we now re-ran Western blots from multiple wild-type and corresponding knock-out tissues. This further confirmed the migration of the protein and loss of the 150 kDa band in the mutant mice (new Figure 1E).

      (3) Figure 2A: A higher-resolution image is recommended. Prior studies on RBM20 mutation knock-in mice suggest that when RBM20 localizes to the cytoplasm, it promotes molecular condensate formation. This seems to be the case in Figure 2A; however, the low image quality makes it difficult to see these molecular condensates.

      Figure2A shows endogenous RBM20 (not the epitope-tagged protein in the knock-in mice). The vast majority of the protein is localized in the nucleus rather than the cytoplasm. We are a bit uncertain what “condensates” the reviewer refers to. In the heart, we indeed see accumulations of RBM20 in foci (as described previously in the literature). As judged by their location within the DAPI-positive area, these foci are in the nucleus. By contrast, in the olfactory bulb neurons (which express lower levels of RBM20) we do not see a comparable concentration in nuclear foci but rather broad and diffuse staining. This is consistent with the hypothesis that the nuclear foci depend on the expression of highly expressed target transcripts such as titin. To better visualize this, we now uploaded files with higher resolution for the revised manuscript.

      (4) Figure 4D: This figure is not cited in the main text and should be referenced appropriately.

      We corrected this.

      (5) Page 5: The sentence "Finally, introns bound by RBM20 were significantly longer than expected by chance as assed..." contains a typo. The word "assed" should be corrected to "assessed".

      We corrected this.

      (6) Functional data: The study would benefit from functional experiments to elucidate the physiological role of RBM20 in PV neurons. For instance, since RBM20 regulates calcium-handling genes in neurons, does its absence impair calcium signaling in PV neurons? Additionally, given that RBM20 is involved in synaptic regulation, could RBM20 KO disrupt synaptic function? While it may not be feasible to address all these questions, providing some functional data would greatly enhance the overall significance of the study.

      We completely agree with the reviewer that this would greatly advance the study and the lack of data on cellular functions is the most significant limitation of this work. We attempted to obtain insights into cellular function through the structural investigations (Fig.S5). We had obtained some data on a behavioral phenotype in the mice which indicates that knock-out in vGLUT2 neurons precipitates alterations in behavior. However, due to conditions in our animal facility (emissions from construction) we struggled to solidify/confirm this data. Thus, in the interest of sharing the existing data in a timely manner we felt that more elaborate functional studies on synaptic transmission or calcium imaging should better be performed in a separate effort.

    1. Author response:

      On the control of taxonomic versus thematic information. Both reviewers had questions about the relationship between the focus of the meta-analysis, the control of responses based on taxonomic versus thematic relationships, and the simulation. Both the model and the meta-analysis focus on the same mechanism, the controlled selection of task-appropriate features. In the case of the meta-analysis, this was the features and associations needed to identify the taxonomic or thematic relationships. As reviewer 1 notes, one possibility is that these kinds of structures are represented in distinct cortical regions. For instance, Mirman, Schwartz and colleagues have suggested that temporoparietal regions may preferentially support thematic knowledge while temporal regions may preferentially support taxonomic knowledge. Alternatively, they may be supported by different features instantiated within the same regions.  However, whether taxonomic and thematic relationships require access to features in different regions or not, is not crucial to the conclusions of this paper. The simulations used here happen to select features based on their inclusion in a particular sensory modality, yet they could learn to select any combination of features. Indeed, prior simulations using the Jackson et al., (2021) model show that the functional impact on learning of “deep” conceptual representations (together with controlled behaviours) is the same regardless of whether the potentiated features are localised within one spoke or distributed across spokes. Thus, the key results regarding the acquisition of semantic knowledge before the maturation of control in the current work should hold regardless of whether knowledge of taxonomic and thematic relations is localised to different anatomical regions.

      On model size and scalability. Both reviewers noted the relatively small size of the model and wondered about implications for ecological validity of the simulations and scalability to larger, noisier, and potentially more systematically structured training environments. We agree this is an important direction for future research, but one that faces two nontrivial challenges. First, reviewer 1 notes that, whereas our model environment employs orthogonal structures across spokes and for the cross-modal features, perceptual structure may be better-aligned with conceptual structure for real-world experience. While we appreciate the intuition, its validity depends to a key extent on how visual information about objects is encoded. Conceptual structure is certainly not apparent, for instance, in the distance between bitmap images of objects, nor the overlap of simple feature-extraction algorithms (such as edge detection or Fourier decomposition, etc). Even in this age of deep vision models, it remains unclear how the visual system extracts and discerns perceptual similarity from retinal input (see e.g. Mukherjee & Rogers, 2025). Most successful contemporary models train neural networks to assign visual images to semantic categories, suggesting that the visual features the model learns, and thus the perceptual similarities it represents, depend on learning to generate semantic information. Therefore, it is not clear whether the similarity that people perceive amongst instances of the same class is natively apparent in the bottom-up visual input, or whether it depends on semantic/cross-modal learning and representation. It should also be noted that within our training environment, there are features in each modality that are predictive of features in other modalities, as well as some that are only predictive of features within this modality. Thus, the full cross-modality conceptual structure is not orthogonal to the information available in each sensory domain, instead there is a relationship between surface and multimodal similarity in the dataset as in the real-world environment. In general, one virtue of the small-scale modelling endeavour in the current work is that we can be very explicit about the nature of the structure apparent within and across spokes.

      The second non-trivial issue concerns the nature of the mechanisms that allow for context-sensitive responding in large-scale language/vision models such as GPT 4. Such models are trained on web-scale language and vision and provide a means of simulating controlled behaviour with realistic stimuli, so might seem to provide a means of assessing scalability of current neuro-cognitive models. Large language/vision models rely, however, on transformer architectures whose relationship to hypothesized mechanisms of control in the mind and brain is unclear. In transformers, context-sensitive responding depends upon “attention” mechanisms that are fully distributed and integrated throughout the entire system—there is no distinction between control, representation, and short-term memory in the architecture. As a consequence, it is very difficult to understand why a model behaves the way it does, or to relate patterns of behaviour to hypothesised mechanisms in the human mind/brain. Yet transformers are currently the only models capable of exhibiting context-sensitive patterns of responding based on both language and vision. Scaling up neuro-cognitive models will require developing alternative architectures that preserve the critical hypothesised distinctions between representation and control while retaining the ability of transformers to learn from large-scale ecologically realistic corpora of language and images. In the meantime, small-scale simulations like those reported here provide some critical insights into aspects of architecture and maturation that may aid in this endeavour.

      On including a response layer. Reviewer 1 notes that our model does not separately simulate response-generation and the selective activation of relevant feature representations. We agree that there are interesting questions about how feature-potentiation and response-generation relate to one another, and that incorporating response selection in the current model would significantly complicate the analysis. The general idea that control potentiates/suppresses task-relevant feature representations in addition to simply promoting the correct response derives from classic work by Martin and others (e.g., Martin et al., 1995) showing that, for instance, regions involved in colour perception activate more strongly in tasks requiring retrieval of colour than tasks involving retrieval of action and vice versa—results consistent with the model training/testing procedure in the current work. In general, it may be counterproductive to become aware of aspects of a concept that would be irrelevant, or even actively unhelpful in making a response, suggesting guided activation is a necessary precursor to response selection (Botvinick & Cohen, 2014). Here, we focus on this important feature potentiation step.

      On the novelty of the meta-analysis. Reviewer 2 suggests the results of the meta-analysis were already known and provided motivation for the simulation. However, an important contribution of the current work is the observation that, in fact, there is little prior work on the development of semantic control. The widely known developmental delay in domain-general executive control, which did indeed motivate the study, is exclusively based on tasks requiring very different forms of executive control. Many of these involve no meaningful stimuli or require the child to completely inhibit a practiced response and generate an opposite or completely arbitrary responses, instead of requiring the child to use context to select among two or more meaningful behaviours that are equally valid in different contexts (see the introduction to Part 2). This observation, coupled with recent evidence that semantic control relies on dedicated and partially non-overlapping neural systems to executive function, illustrates the utility of the current meta-analysis: delineating the developmental trajectory of semantic control requires a task in which control is applied to the context-appropriate retrieval and manipulation of semantic knowledge, such as the triadic matching task. Moreover, the results show that semantic control, while arising later than semantic representation, nevertheless begins to mature earlier (around 2.5 years) than typical estimations of domain-general executive control (around 4). Thus, the meta-analysis contributes to our understanding of cognitive development while also testing a key prediction of the model.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Pavel et al. analyzed a cohort of atrial fibrillation (AF) patients from the University of

      Illinois at Chicago, identifying TTN truncating variants (TTNtvs) and TTN missense variants (TTNmvs). They reported a rare TTN missense variant (T32756I) associated with adverse clinical outcomes in AF patients. To investigate its functional significance, the authors modeled the TTN-T32756I variant using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). They demonstrated that mutant cells exhibit aberrant contractility, increased activity of the cardiac potassium channel KCNQ1 (Kv7.1), and dysregulated calcium homeostasis. Interestingly, these effects occurred without compromising sarcomeric integrity. The study further identified increased binding of the titin-binding protein Four-and-a-Half Lim domains 2 (FHL2) with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I iPSCaCMs.

      Strengths:

      This work has translational potential, suggesting that targeting KCNQ1 or FHL2 could represent a novel therapeutic strategy for improving cardiac function. The findings may also have broader implications for treating patients with rare, disease-causing variants in sarcomeric proteins and underscore the importance of integrating genomic analysis with experimental evidence to advance AF research and precision medicine.

      Weaknesses

      (1) Variant Identification: It is unclear how the TTN missense variant (T32756I) was identified using REVEL, as none of the patients' parents reportedly carried the mutation or exhibited AF symptoms. Are there other TTN variants identified in the three patients carrying TTN-T32756I? Clarification on this point is necessary.  

      We thank the reviewer for their insightful comment. Our study identified deleterious missense variants using a stringent REVEL score threshold of ≥0.7; however, variants with a REVEL score above 0.5 are generally considered potentially pathogenic (Ioannidis, Nilah M., et al., Am J Human Genetics 2016; 9.4: 877-885). The TTN-T32756I variant (REVEL Score: 0.58758, Supplementary Table 1) was prioritized due to its occurrence in multiple unrelated individuals within our clinical AF cohort, despite no reported family history of AF in affected individuals. While no parental inheritance was observed, the possibility of a de novo origin cannot be excluded. Furthermore, this variant is located within a region overlapping a deletion mutation recently shown to cause AF in a zebrafish model (Jiang et al., iScience, 2024;27(7):110395) supporting its potential pathogenicity. Notably, the affected individuals did not carry additional loss-of-function TTN variants. We will clarify these points in the revised manuscript.

      (2) Patient-Specific iPSC Lines: Since the TTN-T32756I variant was modeled using only one healthy iPSC line, it is unclear whether patient-specific iPSC-derived atrial cardiomyocytes would exhibit similar AF-related phenotypes. This limitation should be addressed.

      We acknowledge the reviewer’s concern that patient-specific iPSC lines could further validate our findings. However, due to the patients' unavailability of peripheral blood mononuclear cells (PBMCs), we utilized a healthy iPSC line and introduced the TTN-T32756I variant using CRISPR/Cas9 genome editing. This approach ensures an isogenic background, thereby minimizing genetic variability and providing a controlled system to study the direct effects of the mutation. We will acknowledge this limitation in the revised manuscript.

      (3) Hypertension as a Confounding Factor: The three patients carrying TTN-T32756I also have hypertension. Could the hypertension associated with this variant contribute secondarily to AF? The authors should discuss or rule out this possibility.

      We agree that hypertension is a common comorbidity in patients with AF and could contribute to disease progression. However, all three individuals carrying TTN-T32756I exhibited early-onset AF (onset before 66 years), with one case occurring as early as 36 years. This suggests a potential two-hit mechanism, where genetic predisposition and comorbidities influence disease risk. Importantly, our iPSC model isolates the genetic effects of TTN-T32756I from other factors, supporting a direct pathogenic role. We will explicitly discuss this in the revised manuscript.

      (4) FHL2 and KCNQ1-KCNE1 Interaction: Immunostaining data demonstrating the colocalization of FHL2 with the KCNQ1-KCNE1 (MinK) complex in TTN-T32756I iPSC-aCMs are needed to strengthen the mechanistic findings.

      We appreciate the reviewer’s suggestion and agree that additional immunostaining data would strengthen the evidence for FHL2 colocalization with the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs. We will work on obtaining these additional data to validate our mechanistic findings further.

      (5) Functional Characterization of FHL2-KCNQ1-KCNE1 Interaction: To further validate the proposed mechanism, additional functional assays are necessary to characterize the interaction between FHL2 and the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs.

      We agree with the reviewer that additional functional assays would further validate the proposed mechanism. We will perform contractility and electrophysiological experiments, such as multielectrode array (MEA) assays, to characterize better the interaction between FHL2 and the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs.

      Reviewer #2 (Public review):

      Summary:

      The authors present data from a single-center cohort of African-American and Hispanic/Latinx individuals with atrial fibrillation (AF). This study provides insight into the incidences and clinical impact of missense variants in this population in the Titin (TTN) gene. In addition, the authors identified a single amino acid TTN missense variant (TTN-T32756I) that was further studied using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). These studies demonstrated that the Four-and-a-Half Lim domains 2 (FHL2) has increased binding with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I-iPSCaCMs, enhancing the slow delayed rectifier potassium current (Iks) and is a potential mechanism for atrial fibrillation. Finally, the authors demonstrate that suppression of FHL2 could normalize the Iks current.

      Strengths:

      The strengths of this manuscript/study are listed below:

      (1) This study includes a previously underrepresented population in the study of the genetic and mechanistic basis of AF.

      (2) The authors utilize current state-of-the-art methods to investigate the pathogenicity of a specific TTN missense variant identified in this underrepresented patient population.

      (3) The findings of this study identify a potential therapeutic for treating atrial fibrillation.

      Weaknesses:

      (1) The authors do not include a non-AF group when evaluating the incidence and clinical significance of TTN missense variants in AF patients.

      We acknowledge the limitation of not including a non-AF group in our clinical analysis. Our cohort is derived from a single-center registry of individuals with AF, and we do not have a matched cohort of non-AF controls to compare the incidence of TTN missense variants. We recognize this as a limitation and will clarify that further studies are needed to define the prevalence of TTN missense variants in broader, multiethnic cohorts that include both AF and non-AF individuals.

      (2) The authors do not provide evidence that TTN-T32756I-iPSC-aCMs are arrhythmogenic, only that there is an increase in the Iks current and associated action potential changes. More specifically, the authors report that "compared to the WT, TTN-T32756I-iPSC-aCMs exhibited increased arrhythmic frequency," yet it is unclear what they are referring to by "arrhythmic frequency."

      We appreciate the reviewer’s request for clarification regarding "arrhythmic frequency." In our study, this term refers to the increased spontaneous beating rate and irregular action potentials observed in TTN-T32756I iPSC-aCMs compared to WT. Our findings suggest that the AF-associated TTN-T32756I variant induces ion channel remodeling and beating abnormalities, possibly contributing to an arrhythmogenic substrate for AF. We will refine our wording in the revised manuscript to enhance clarity and precision.

      (3) There seem to be discrepancies regarding the impact of the TTN-T32756I variant on mechanical function. Specifically, the authors report "both reduced contraction and abnormal relaxation in TTN-T32756I-iPSC-aCMs" yet, separately report "the contraction amplitude of the mutant was also increased … suggesting an increased contractile force by the TTN-T32756IiPSC-aCMs and TTN-T32756I-iPSC-CMs exhibited similar calcium transient amplitudes as the WT."

      We thank the reviewer for pointing this out and apologize for the inconsistency. We intended to report on contraction duration and relaxation rather than contraction force alone. The increased contraction amplitude reflects altered contractile force, whereas the reduced contraction duration and impaired relaxation indicate dysfunctional contractile dynamics. We will revise the text and corresponding figures to convey these findings accurately.

      Reviewer #3 (Public review):

      Summary:

      The authors describe the abnormal contractile function and cellular electrophysiology in an iPSC model of atrial myocytes with a titin missense variant. They provide contractility data by sarcomere length imaging, calcium imaging, and voltage clamp of the repolarizing current iKs. While each of the findings is interesting, the paper comes across as too descriptive because there is no data merging to support a cohesive mechanistic story/statement, especially from the electrophysiological standpoint. There is not enough support for the title "A Titin Missense Variant Causes Atrial Fibrillation", since there is no strong causative evidence. There is some interesting clinical data regarding the variant of interest and its association with HF hospitalization, which may lead to future important discoveries regarding atrial fibrillation.

      Strengths:

      The manuscript is well written, and a wide range of experimental techniques are used to probe this atrial fibrillation model.

      Weaknesses

      (1) While the clinical data is interesting, it is essential to rule out heart failure with preserved EF as a confounder. HFpEF leads to AF due to increased atrial remodeling, so the fact that patients with this missense variant have increased HF hospitalizations does not necessarily directly support the variant as causative of AF. It could be that the variant is associated directly with HFpEF instead, and this needs to be addressed and corrected in the analyses.

      We recognize that AF and HFpEF frequently coexist and that HFpEF-related atrial remodeling could contribute to AF development. The primary aim of our cohort analysis was to explore the potential clinical significance of TTNmv. While we acknowledge the inherent limitations of retrospective observational data in establishing causality, our subsequent in vitro experiments were designed to demonstrate that TTNmv can alter the electrophysiological substrate, potentially predisposing individuals to AF.

      As HFpEF is a potential confounder, it is reasonable to consider whether TTNmv may also be associated with HFpEF. However, to our knowledge, no existing literature directly links TTNmv to HFpEF. In contrast, loss-of-function TTN variants are typically associated with heart failure with reduced ejection fraction (HFrEF) and dilated cardiomyopathy, and even their role in HFrEF remains controversial. To address potential confounding, our multivariable analysis for clinical outcomes was adjusted for reduced ejection fraction, and we conducted a sensitivity analysis excluding patients with nonischemic dilated cardiomyopathy (Supplementary Table 6). We will clarify these points in the revised manuscript.

      (2) All contractility and electrophysiologic data should be done with pacing at the same rate in both control and missense variant groups, to control for the effect of cycle length on APD and calcium loading. A shorter APD cannot be claimed when the firing rate of one set of cells is much faster than the other, since shorter APD is to be expected with a quicker rate. Similarly, contractility is affected by diastolic interval because of the influence of SR calcium content on the myocyte power stroke. So the cells need to be paced at the same rate in the IonOptix for any direct comparison of contractility. The authors should familiarize themselves with the concept of electrical restitution.

      We appreciate the reviewer’s technical concern. iPSC-derived cardiomyocytes (iPSC-CMs) exhibit spontaneous beating due to the presence of pacemaker-like currents and the absence of I<sub>k1</sub>, which allows for the study of intrinsic electrophysiological properties, ion channel function, and disease modeling. In our study, we utilized this unique property of iPSCCMs to test our hypothesis that TTNmvs alter electrophysiological properties through ion channel remodeling.

      While iPSC-CMs with identical backgrounds are expected to show comparable electrophysiological phenotypes under the same conditions, variability due to biological and technical factors (e.g., protein expression and culture handling) can result in differences between samples. We agree with the reviewer that pacing iPSC-CMs at the same rate for action potential duration (APD) and contractility measurements will control for cycle length effects and improve the reliability and interpretability of our findings. We will incorporate this approach into our revised experimental design.

      (3) It is interesting that the firing rate of the myocytes is faster with the missense variant. This should lead to a hypothesis and investigation of abnormal automaticity or triggered activity, which may also explain the increased contractility since all these mechanisms are related to the SR's calcium clock and calcium loading. See #2 above for suggestions on how to probe calcium handling adequately. Such an investigation into impulse initiation mechanisms would be compelling in supporting the primary statement of the paper since these are actual mechanisms thought to cause AF.

      We agree with the reviewer that investigating abnormal automaticity or triggered activity about the increased firing rate observed with the missense variant could provide valuable insights into the mechanisms underlying AF. As these processes are closely linked to calcium handling and the calcium clock, probing calcium cycling abnormalities could strengthen our understanding of how TTNmvs contribute to AF. We will incorporate additional experiments to investigate these mechanisms, further supporting our study's central hypothesis.

      (4) The claim of shortened APD without correcting for cycle length is problematic. However, linking shortened APD in isolated cells alone to AF causation is more complicated. To have a setup for reentry, there must be a gradient of APD from short to long, and this can only be demonstrated at the tissue level, not at the cellular level, so reentry should not be invoked here. If shortened APD is demonstrated with correction of the cycle length problem, restitution curves can be made showing APD shortening at different cycle lengths. If restitution is abnormal (i.e. the APD does not shorten normally in relation to the diastolic interval), this may lead to triggered activity which is an arrhythmogenic mechanism. This would also tie in well with the finding of abnormally elevated iKs current since iKs is a repolarizing current directly responsible for restitution.

      We appreciate the reviewer’s insightful comment. We recognize that isolated cell studies cannot directly demonstrate reentrant circuits, and we agree that reentry should not be invoked solely based on cellular data. Our claim of shortened APD is based on observed abnormalities in APD and beating patterns, which may contribute to conditions conducive to reentry at the tissue level. We will clarify this distinction in the revised manuscript and refrain from directly linking APD shortening to reentry without tissue-level evidence.

    1. Author response:

      Our reviewers brought three things to our notice:

      (1) PolyP has not been introduced as an abbreviation in the abstract.

      (2) 'colorimetric' is misspelled as 'calorimetric' in the following sentence of the results section.

      This method involved the digestion of polyP by recombinant S. cerevisiae exopolyphosphatase 1 (_Sc_Ppx1) followed by calorimetric measurement of the released Pi by malachite green.

      (3) A reference for hNUDT3 has been deleted due to the same technical glitch from the following sentence of introduction.

      Recently, biochemical experiments led to the discovery of endopolyphosphatase NUDT3, an enzyme known as a dinucleoside phosphatase.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The questions after reading this manuscript are what novel insights have been gained that significantly go beyond what was already known about the interaction of these receptors and, more importantly, what are the physiological implications of these findings? The proposed significance of the results in the last paragraph of the Discussion section is speculative since none of the receptor interactions have been investigated in TNBC cell lines. Moreover, no physiological experiments were conducted using the PRLR and GH knockout T47D cells to provide biological relevance for the receptor heteromers. The proposed role of JAK2 in the cell surface distribution and association of both receptors as stated in the title was only derived from the analysis of box 1 domain receptor mutants. A knockout of JAK2 was not conducted to assess heteromers formation.

      We thank the reviewer for these comments. The novel insight is that two different cytokine receptors can interact in an asymmetric, ligand-dependent manner, such that one receptor regulates the other receptor’s surface availability, mediated by JAK2. To our knowledge this has not been reported before. Beyond our observations, there is the question if this could be a much more common regulatory mechanism and if it has therapeutic relevance. However, answering these questions is beyond the scope of this work.

      Along the same line, the question regarding the biological relevance of our receptor heteromers and JAK2’s role in cell surface distribution is undoubtfully very important. Studying GHR-PRLR cell surface distributions in JAK2 knockout cells and certain TNBC cell lines as proposed by the reviewer could perhaps be insightful. However, most TNBCs down-regulate PRLR [1], so we would first have to identify TNBC cell lines that actually express PRLR at sufficiently high levels. Moreover, knocking out JAK2 is known to significantly reduce GHR surface availability [2,3], such that the proposed experiment would probably provide only limited insights.

      Unfortunately, our team is currently not in the position to perform any experiments (due to lack of funding and shortage of personnel). However, to address the reviewer’s comment as much as possible, we have revised the respective paragraph of the discussion section to emphasize the speculative nature of our statement and have added another paragraph discussing shortcoming and future experiments (see revised manuscript, pages 23-24).

      (1) López-Ozuna, V., Hachim, I., Hachim, M. et al. Prolactin Pro-Differentiation Pathway in Triple Negative Breast Cancer: Impact on Prognosis and Potential Therapy. Sci Rep 6, 30934 (2016). https://www.nature.com/articles/srep30934

      (2) He, K., Wang, X., Jiang, J., Guan, R., Bernstein, K.E., Sayeski, P.P., Frank, S.J. Janus kinase 2 determinants for growth hormone receptor association, surface assembly, and signaling. Mol Endocrinol. 2003;17(11):2211-27. doi: 10.1210/me.2003-0256. PMID: 12920237.

      (3) He, K., Loesch, K., Cowan, J.W., Li, X., Deng, L., Wang, X., Jiang, J., Frank, S.J. Janus Kinase 2 Enhances the Stability of the Mature Growth Hormone Receptor, Endocrinology, Volume 146, Issue 11, 2005, Pages 4755–4765,https://doi.org/10.1210/en.2005-0514

      (2) Except for some investigation of γ2A-JAK2 cells, most of the experiments in this study were conducted on a single breast cancer cell line. In terms of rigor and reproducibility, this is somewhat borderline. The CRISPR/Cas9 mutant T47D cells were not used for rescue experiments with the corresponding full-length receptors and the box1 mutants. A missed opportunity is the lack of an investigation correlating the number of receptors with physiological changes upon ligand stimulation (e.g., cellular clustering, proliferation, downstream signaling strength).

      We appreciate the reviewer’s comments. While we are confident in the reproducibility of our findings, including those obtained in the T47D cell line, we acknowledge that testing in additional cell lines would have strengthened the generalizability of our results. We also recognize that performing a rescue experiment using our T47D hPRLR or hGHR KO cells would have been valuable. Furthermore, examining physiological changes, such as proliferation rates and downstream signaling responses, would have provided additional insights. Unfortunately, these experiments were not conducted at the time, and we currently lack the resources to carry them out.

      (3) An obvious shortcoming of the study that was not discussed seems to be that the main methodology used in this study (super-resolution microscopy) does not distinguish the presence of various isoforms of the PRLR on the cell surface. Is it possible that the ligand stimulation changes the ratio between different isoforms? Which isoforms besides the long form may be involved in heteromers formation, presumably all that can bind JAK2?

      This is a very good point. We fully agree with the reviewer that a discussion of the results in the light of different PRLR isoforms is appropriate. We have added information on PRLR isoforms to the Introduction (see revised manuscript, page 2) and Discussion sections (see revised manuscript, pages 23-24).

      (4) Changes in the ligand-inducible activation of JAK2 and STAT5 were not investigated in the T47D knockout models for the PRL and GHR. It is also a missed opportunity to use super-resolution microscopy as a validation tool for the knockouts on the single cell level and how it might affect the distribution of the corresponding other receptor that is still expressed.

      We thank the reviewer for his comment. We fully agree that such additional experiments could be very valuable. We are sorry but, as already mentioned above, this is not something we are able to address at this stage due to lack of personnel and funding. However, we do hope to address these and other proposed experiments in the future.

      (5) Why does the binding of PRL not cause a similar decrease (internalization and downregulation) of the PRLR, and instead, an increase in cell surface localization? This seems to be contrary to previous observations in MCF-7 cells (J Biol Chem. 2005 October 7; 280(40): 33909-33916).

      It has been recently reported for GHR that not only JAK2 but also LYN binds to the box1-box2 region, creating competition that results in divergent signaling cascades and affects GHR nanoclustering [1]. So, it is reasonable to assume that similar mechanisms may be at work that regulate PRLR cell surface availability. Differences in cells’ expression of such kinases could perhaps play a role in the perceived inconsistency. Also, Lu et al. [2] studied the downregulation of the long PRLR isoform in response to PRL. All other PRLR isoforms were not detectable in MCF-7 cells. So, differences between MCF-7 and T47D may lead to this perceived contradiction.

      At this stage, we can only speculate about the actual reasons for these seemingly contradictory results. However, for full transparency, we are now mentioning this apparent contradiction in the Discussion section (see page 23) and have added the references below.

      (1) Chhabra, Y., Seiffert, P., Gormal, R.S., et al. Tyrosine kinases compete for growth hormone receptor binding and regulate receptor mobility and degradation. Cell Rep. 2023;42(5):112490. doi: 10.1016/j.celrep.2023.112490. PMID: 37163374.

      https://www.cell.com/cell-reports/pdf/S2211-1247(23)00501-6.pdf

      (2) Lu, J.C., Piazza, T.M., Schuler, L.A. Proteasomes mediate prolactin-induced receptor down-regulation and fragment generation in breast cancer cells. J Biol Chem. 2005 Oct 7;280(40):33909-16. doi: 10.1074/jbc.M508118200. PMID: 16103113; PMCID: PMC1976473.

      (6) Some figures and illustrations are of poor quality and were put together without paying attention to detail. For example, in Fig 5A, the GHR was cut off, possibly to omit other nonspecific bands, the WB images look 'washed out'. 5B, 5D: the labels are not in one line over the bars, and what is the point of showing all individual data points when the bar graphs with all annotations and SD lines are disappearing? As done for the y2A cells, the illustrations in 5B-5E should indicate what cell lines were used. No loading controls in Fig 5F, is there any protein in the first lane? No loading controls in Fig 6B and 6H.

      We thank the reviewer for pointing this out. We have amended Fig. 5A to now show larger crops of the two GHR and PRLR Western Blot images and thus a greater range of proteins present in the extracts. Please note that the bands in the WBs other than what is identified as GHR and PRLR are non-specific and reflect roughly equivalent loading of protein in each lane.

      We also made some changes to Figures 5B-5E.

      (7) The proximity ligation method was not described in the M&M section of the manuscript.

      We thank the reviewer for pointing this out. We have added a description of the PL method to the Methods section.

      Reviewer #1 (Recommendations for the Authors):

      A final suggestion for future investigations: Instead of focusing on the heteromer formation of the GHR/PRLR which both signal all through the same downstream effectors (JAK2, STAT5), it would have been more cancer-relevant, and perhaps even more interesting, to look for heteromers between the PRLR and receptors of the IL-6 family since it had been shown that PRL can stimulate STAT3, which is a unique feature of cancer cells. If that is the case, this would require a different modality of the interaction between different JAK kinases.

      We highly appreciate the reviewer’s recommendation and hope to follow up on it in the near future.

      Reviewer #2 (Public Review):

      (1) I could not fully evaluate some of the data, mainly because several details on acquisition and analysis are lacking. It would be useful to know what the background signal was in dSTORM and how the authors distinguished the specific signal from unspecific background fluorescence, which can be quite prominent in these experiments. Typically, one would evaluate the signal coming from antibodies randomly bound to a substrate around the cells to determine the switching properties of the dyes in their buffer and the average number of localisations representing one antibody. This would help evaluate if GHR or PRLR appeared as monomers or multimers in the plasma membrane before stimulation, which is currently a matter of debate. It would also provide better support for the model proposed in Figure 8.

      We are grateful for the reviewer’s comment. In our experience, the background signal is more relevant in dSTORM when imaging proteins that are located at deeper depths (> 3 μm) above the coverslip surface. In our experiments, cells are attached to the coverslip surface and the proteins being imaged are on the cell membrane. In addition, we employed dSTORM’s TIRF (total internal reflection fluorescence) microscopy mode to image membrane receptor proteins. TIRFM exploits the unique properties of an induced evanescent field in a limited specimen region immediately adjacent to the interface between two media having different refractive indices. It thereby dramatically reduces background by rejecting fluorescence from out-of-focus areas in the detection path and illuminating only the area right near the surface.

      Having said that, a few other sources such as auto-fluorescence, scattering, and non-bleached fluorescent molecules close to and distant from the focal plane can contribute to the background signal. We tried to reduce auto-fluorescence by ensuring that cells are grown in phenol-red-free media, imaging is performed in STORM buffer which reduces autofluorescence, and our immunostaining protocol includes a quenching step aside from using blocking buffer with different serum, in addition to BSA. Moreover, we employed extensive washing steps following antibody incubations to eliminate non-specifically bound antibodies. Ensuring that the TIRF illumination field is uniform helps reduce scatter. Additionally, an extended bleach step prior to the acquisition of frames to determine localizations helped further reduce the probability of non-bleached fluorescent molecules.

      In short, due to the experimental design we do not expect much background. However, in the future, we will address this concern and estimate background in a subtype dependent manner. To this end we will distinguish two types of background noise: (A) background with a small change between subsequent frames, which mainly consists of auto-fluorescence and non-bleached out-of-focus fluorescent molecules; and (B) background that changes every imaging frame, which is mainly from non-bleached fluorescent molecules near the focal plane. For type (A) background, temporal filters must be used for background estimation [1]; for type (B) background, low-pass filters (e.g., wavelet transform) should be used for background estimation [2].

      (1) Hoogendoorn, Crosby, Leyton-Puig, Breedijk, Jalink, Gadella, and Postma (2014). The fidelity of stochastic single-molecule super-resolution reconstructions critically depends upon robust background estimation. Scientific reports, 4, 3854. https://doi.org/10.1038/srep03854

      (2) Patel, Williamson, Owen, and Cohen (2021). Blinking statistics and molecular counting in direct stochastic reconstruction microscopy (dSTORM). Bioinformatics, Volume 37, Issue 17, September 2021, Pages 2730–2737, https://doi.org/10.1093/bioinformatics/btab136

      (2) Since many of the findings in this work come from the evaluation of localisation clusters, an image showing actual localisations would help support the main conclusions. I believe that the dSTORM images in Figures 1 and 2 are density maps, although this was not explicitly stated. Alexa 568 and Alexa 647 typically give a very different number of localisations, and this is also dependent on the concentration of BME. Did the authors take that into account when interpreting the results and creating the model in Figures 2 and 8?

      I believe that including this information is important as findings in this paper heavily rely on the number of localisations detected under different conditions.

      Including information on proximity labelling and CRISPR/Cas9 in the methods section would help with the reproducibility of these findings by other groups.

      Figures 1 and 2 show Gaussian interpolations of actual localizations, not density maps. Imaging captured the fluorophores’ blinking events and localizations were counted as true localizations, when at least 5 consecutive blinking events had been observed. Nikon software was used for Gaussian fitting. In other words, we show reconstructed images based on identifying true localizations using gaussian fitting and some strict parameters to identify true fluorophore blinking. This allowed us to identify true localizations with high confidence and generate a high-resolution image for membrane receptors.

      Indeed, Alexa 568 and 647 give different numbers of localization. This is dependent on the intrinsic photo-physics of the fluorophores. Specifically, each fluorophore has a different duty cycle, switching cycle, and survival fraction. However, we note that we focused on capturing the relative changes in receptor numbers over time, before and after stimulation by ligands, not the absolute numbers of surface GHR and PRLR. We are not comparing the absolute numbers of localizations or drawing comparisons for localization numbers between 568 and 647. For all these different conditions/times, the photo-physics for a particular fluorophore remains the same. This allows us to make relative comparisons.

      As far as the effect of BME is concerned, the concentration of mercaptoethanol needs to be carefully optimized, as too high a concentration can potentially quench the fluorescence or affect the overall stability of the sample. However, we are using an optimized concentration which has been previously validated across multiple STORM experiments. This makes the concerns relating to the concentration of BME irrelevant to the current experimental design. Besides, the concentration of BME is maintained across all experimental conditions.

      We have added information regarding PL and CRISPR/Cas9 for generating hGHR KO and hPRLR KO cells in two new subsections to the Methods section.

      Reviewer #2 (Recommendations for the authors):

      In the methods please include:<br /> (1) A section with details on proximity ligation assays.

      We have added a description of the PL method to the Methods section.

      (2) A section on CRISPR/Cas9 technology.

      We have added two new sections on “Generating hGHR knockout and hPRLR knockout T47D cells” and “Design of sgRNAs for hGHR  or hPRLR knockout” to the Methods section.

      (3) List the precise composition of the buffer or cite the paper that you followed.

      We used the buffer recipe described in this protocol [1] and have added the components with concentrations as well as the following reference to the manuscript.

      (1) Beggs, R.R., Dean, W.F., Mattheyses, A.L. (2020). dSTORM Imaging and Analysis of Desmosome Architecture. In: Turksen, K. (eds) Permeability Barrier. Methods in Molecular Biology, vol 2367. Humana, New York, NY. https://doi.org/10.1007/7651_2020_325

      (4) Exposure time used for image acquisition to put 40 000 frames in the context of total imaging time and clarify why you decided to take 40 000 images per channel.

      Our Nikon Ti2 N-STORM microscope is equipped with an iXon DU-897 Ultra EMCCD camera from Andor (Oxford Instruments). According to the camera’s manufacturer, this camera platform uses a back-illuminated 512 x 512 frame transfer sensor and overclocks readout to 17 MHz, pushing speed performance to 56 fps (in full frame mode). We note that we always tried to acquire STORM images at the maximal frame rate. As for the exposure time, according to the manufacturer it can be as short as 17.8 ms. We would like to emphasize that we did not specify/alter the exposure time.

      See also: https://andor.oxinst.com/assets/uploads/products/andor/documents/andor-ixon-ultra-emccd-specifications.pdf

      The decision to take 40,000 images per frame was based on our intention to identify the true population of the molecules of interest that are localized and accurately represented in the final reconstruction image. The total number of frames depends on the sample complexity, density of sample labeling and desired resolution. We tested a range of frames between 20,000 and 60,000 and found for our experimental design and output requirements that 40,000 frames provided the best balance between achieving maximal resolution and desired localizations to make consistent and accurate localization estimates across different stimulation conditions compared to basal controls.

      (5) The lasers used to switch Alexa 568 and Alexa 647. Were you alternating between the lasers for switching and imaging of dyes? Intermittent and continuous illumination will produce very different unspecific background fluorescence.

      Yes, we used an alternating approach for the lasers exciting Alexa 647 and Alexa 568, for both switching and imaging of the dyes.

      (6) A paragraph with a detailed description of methods used to differentiate the background fluorescence from the signal.

      We have addressed the background fluorescence under Point 1 (Public Review). We have added a paragraph in the Methods section on this issue.

      (7) Minor corrections to the text:

      It appears as though there is a large difference in the expression level of GHR and PRLR in basal conditions in Figure 1. This can be due to the switching properties of the dyes, which is related to the amount of BME in the buffer, or it can be because there is indeed more PRL. Would the authors be able to comment on this?

      We thank the reviewer for this suggestions. According to expression data available online there is indeed more PRLR than GHR in T47D cells. According to CellMiner [1], T47D cells have an RNA-Seq gene expression level log2(FPKM + 1) of 6.814 for PRLR, and 3.587 for GHR, strongly suggesting that there is more PRLR than GHR in basal conditions, matching the reviewer’s interpretation of our images in Fig. 1 (basal). However, we would advise against using STORM images for direct comparisons of receptor expression. First, with TIRF images, we are only looking at the membrane fraction (~150 nm close to the coverslip membrane interface) that is attached to the coverslip. Secondly, as discussed above, our data represent relative cell surface receptor levels that allow for comparison of different conditions (basal vs. stimulation) and does not represent absolute quantifications. Everything is relative and in comparison to controls.

      Also, BME is not going to change the level of expression. The differences in growth factor expression as estimated by relative comparison can be attributed to the actual changes in growth factors and is not an artifact of the amount of BME in the buffer or the properties of dyes. These factors are maintained across all experimental conditions and do not influence the final outcome.

      (1) https://discover.nci.nih.gov/cellminer/

      (8) I would encourage the authors to use unspecific binding to characterize the signal coming from single antibodies bound to the substrate. This would provide a mean number of localizations that a single antibody generates. With this information, one can evaluate how many receptors there are per cluster, which would strengthen the findings and potentially provide additional support for the model presented in Figure 8. It would also explain why the distributions of localisations per cluster in Fig. 3B look very different for hGHR and hPRLR. As the authors point out in the discussion, the results on predimerization of these receptors in basal conditions are conflicting and therefore it is important to shed more light on this topic.

      We thank the reviewer for this suggestions. While we are unable to perform this experiment at this stage, we will keep it in mind for future experiments.

      (9) Minor corrections to the figures:

      Figure 1:

      In the legend, please say what representation was used. Are these density maps or another representation? Please provide examples of actual localisations (either as dots or crosses representing the peaks of the Gaussians). Most findings of this work rely on the characterisation of the clusters of localisations and therefore it is of essence to show what the clusters look like. This could potentially go to the supplemental info to minimise additional work. It's very hard to see the puncta in this figure.

      If the authors created zoomed regions in each of the images (as in Figure 3), it would be much easier to evaluate the expression level and the extent of colocalisation. Halfway through GHR 3 min green pixels become grey, but this may be the issue with the document that was created. Please check. Either increase the font on the scale bars in this figure or delete it.

      As described above, Figure 1 does not show density maps. Imaging captured the fluorophores’ blinking events and localizations were counted as true localizations, when at least 5 consecutive blinking events had been observed. Nikon software was used for Gaussian fitting and smoothing.

      We have generated zoomed regions. In our files (original as well as pdf) we do not see pixels become grey. We increased the font size above one of the scale bars and removed all others.

      Figure 3:

      In A, the GHR clusters are colour coded but PRLR are not. Are both DBSCN images? Explain the meaning of colour coding or show it as black and white. Was brightness also increased in the PRLR image? The font on the scale bars is too small. In B, right panels, the font on the axes is too small. In the figure legend explain the meaning of 33.3 and 16.7

      In our document, both GHR and PRLR are color coded but the hGHR clusters are certainly bigger and therefore appear brighter than the hPRLR clusters. Both are DBSCAN images. The color coding allows to distinguish different clusters (there is no other meaning). We have kept the color-coding but have added a sentence to the caption addressing this. Brightness was increased in both images of Panel B equally. 33.3 and 16.7 are the median cluster sizes. We have added a sentence to the caption explaining this. We have increased the font on the axes in B (right panels).

      Figure 4:

      I struggled to see any colocalization in the 2nd and the 3rd image. Please show zoomed-in sections. In the panels B and C, the data are presented as fractions. Is this per cell? My interpretation is that ~80% of PRL clusters also contain GHR.

      Is this in agreement with Figures 1 and 2? In Figure 1, PRL 3 min, Merge, colocalization seems much smaller. Could the authors give the total numbers of GHR and PRLR from which the fractions were calculated at least in basal conditions?

      We have provided zoom-in views. As for panels B and C, fractions are number of clusters containing both receptors divided by the total number of clusters. We used the same strategy that we had used for calculating the localization changes: We randomly selected 4 ROIs (regions of interest) per cell to calculate fractions and then calculated the average of three different cells from independently repeated experiments. We did not calculate total numbers of GHR/PRLR. The numbers are fractions of cluster numbers.

      Moreover, the reviewer interprets results in panels B and C that ~80% of PRLR clusters also contain GHR. We assume the reviewer refers to Basal state. Now, the reviewer’s interpretation is not correct for the following reason: ~80% of clusters have both receptors. How many of the remaining (~20%) clusters have only PRLR or only GHR is not revealed in the panels. Only if 100% of clusters have PRLR, we can conclude that 80% of PRLR clusters also contain GHR.

      Also, while Figures 1 and 2 show localization based on dSTORM images, Figure 3 indicates and quantifies co-localization based on proximity ligation assays following DBSCAN analysis using Clus-DoC. We do not think that the results are directly comparable.

      Reviewer #3 (Public Review):

      (1) The manuscript suffers from a lack of detail, which in places makes it difficult to evaluate the data and would make it very difficult for the results to be replicated by others. In addition, the manuscript would very much benefit from a full discussion of the limitations of the study. For example, the manuscript is written as if there is only one form of the PRLR while the anti-PRLR antibody used for dSTORM would also recognize the intermediate form and short forms 1a and 1b on the T47D cells. Given the very different roles of these other PRLR forms in breast cancer (Dufau, Vonderhaar, Clevenger, Walker and other labs), this limitation should at the very least be discussed. Similarly, the manuscript is written as if Jak2 essentially only signals through STAT5 but Jak2 is involved in multiple other signaling pathways from the multiple PRLRs, including the long form. Also, while there are papers suggesting that PRL can be protective in breast cancer, the majority of publications in this area find that PRL promotes breast cancer. How then would the authors interpret the effect of PRL on GHR in light of all those non-protective results? [Check papers by Hallgeir Rui]

      We thank the reviewer for such thoughtful comments. We have added a paragraph in the Discussion section on the limitations of our study, including sole focus on T47D and γ2A-JAK2 cells and lack of PRLR isoform-specific data. Also, we are now mentioning that these isoforms play different roles in breast cancer, citing papers by Dufau, Vonderhaar, Clevenger, and Walker labs.

      We did not mean to imply that JAK2 signals only via STAT5 or by only binding the long form. We have made this point clear in the Introduction as well as in our revised Discussion section. Moreover, we have added information and references on JAK2 signaling and PRLR isoform specific signaling.

      In our Discussions section we are also mentioning the findings that PRL is promoting breast cancer. We would like to point out that it is well perceivable that PRL is protective in BC by reducing surface hGHR availability but that this effect may depend on JAK2 levels as well as on expression levels of other kinases that competitively bind Box1 and/or Box2 [1]. Besides, could it not be that PRL’s effect is BC stage dependent? In any case, we have emphasized the speculative nature of our statement.

      (1) Chhabra, Y., Seiffert, P., Gormal, R.S., et al. Tyrosine kinases compete for growth hormone receptor binding and regulate receptor mobility and degradation. Cell Rep. 2023;42(5):112490. doi: 10.1016/j.celrep.2023.112490. PMID: 37163374.

      Reviewer #3 (Recommendations for the authors):

      Points for improvement of the manuscript:

      (1) Method details -

      a) "we utilized CRISPR/Cas9 to generate hPRLR knockout T47D cells ......" Exactly how? Nothing is said under methods. Can we be sure that you knocked out the whole gene?

      We have addressed this point by adding two new sections on “Generating hGHR knockout and hPRLR knockout T47D cells” and “Design of sgRNAs for hGHR or hPRLR knockout” to the Methods section.

      b) Some of the Western blots are missing mol wt markers. How specific are the various antibodies used for Westerns? For example, the previous publications are quoted as providing characterization of the antibodies also seem to use just band cutouts and do not show the full molecular weight range of whole cell extracts blotted. Anti-PRLR antibodies are notoriously bad and so this is important.

      There is an antibody referred to in Figure 5 that is not listed under "antibodies" in the methods.

      We have modified Figure 5a, showing the entire gel as well as molecular weight markers. As for specificity of our antibodies, we used monoclonal antibodies Anti-GHR-ext-mAB 74.3 and Anti-PRLR-ext-mAB 1.48, which have been previously tested and used. In addition, we did our own control experiments to ensure specificity. We have added some of our many control results as Supplementary Figures S2 and S3.

      We thank the reviewer for noticing the missing antibody in the Methods section. We have now added information about this antibody.

      c) There is no description of the proximity ligation assay.

      We have addressed this by adding a paragraph on PLA in the Methods section.

      d) What is the level of expression of GHR, PRLR, and Jak2 in the gamma2A-JAK2 cells compared to the T47D cells? Artifacts of overexpression are always a worry.

      γ2A-JAK2 cell series are over-expressing the receptors. That’s the reason we did not only rely on the observation in γ2A-JAK2 cell lines but also did the experiment in T47D cell lines.

      e) There are no concentrations given for components of the dSTORM imaging buffer. On line 380, I think the authors mean alternating lasers not alternatively.

      Thank you. Indeed, we meant alternating lasers. We are referring to [1] (the protocol we followed) for information on the imaging buffer.

      (1) Beggs, R.R., Dean, W.F., Mattheyses, A.L. (2020). dSTORM Imaging and Analysis of Desmosome Architecture. In: Turksen, K. (eds) Permeability Barrier. Methods in Molecular Biology, vol 2367. Humana, New York, NY. https://doi.org/10.1007/7651_2020_325

      f) In general, a read-through to determine whether there is enough detail for others to replicate is required. 4% PFA in what? Do you mean PBS or should it be Dulbecco's PBS etc., etc.?

      We prepared a 4% PFA in PBS solution. We mean Dulbecco's PBS.

      (2) There are no controls shown or described for the dSTORM. For example, non-specific primary antibody and second antibodies alone for non-specific sticking. Do the second antibodies cross-react with the other primary antibody? Is there only one band when blotting whole cell extracts with the GHR antibody so we can be sure of specificity?

      We used monoclonal antibodies Anti-GHR-ext-mAB 74.3 and Anti-PRLR-ext-mAB 1.48 (but also tested several other antibodies). While these antibodies have been previously tested and used, we performed additional control experiments to ensure specificity of our primary antibodies and absence of non-specific binding of our secondary antibodies. We have added some of our many control results as Supplementary Figures S2 and S3.

      (3) Writing/figures-

      a) As discussed in the public review regarding different forms of the PRLR and the presence of other Jak2-dependent signaling

      We have added paragraphs on PRLR isoforms and other JAK2-dependent signaling pathways to the Introduction. Also, we have added a paragraph on PRLR isoforms (in the context of our findings) to the Discussion section.

      b) What are the units for figure 3c and d?

      The figures show numbers of localizations (obtained from fluorophore blinking events). In the figure caption to 3C and 3D, we have specified the unit (i.e. counts).

      c) The wheat germ agglutinin stains more than the plasma membrane and so this sentence needs some adjustment.

      We thank the reviewer for this comment. We have rephrased this sentence (see caption to Fig. 4).

      d) It might be better not to use the term "downregulation" since this is usually associated with expression and not internalization.

      While we understand the reviewer’s discomfort with the use of the word “downregulation”, we still think that it best describes the observed effect. Moreover, we would like to note that in the field of receptorology “downregulation” is a specific term for trafficking of cell surface receptors in response to ligands. That said, to address the reviewer’s comment, we are now using the terms “cell surface downregulation” or “downregulation of cell surface [..] receptor” throughout the manuscript in order to explicitly distinguish it from gene downregulation.

      e) Line 420 talks about "previous work", a term that usually indicates work from the same lab. My apologies if I am wrong, but the reference doesn't seem to be associated with the authors.

      At the end of the sentence containing the phrase “previous work”, we are referring to reference [57], which has Dr. Stuart Frank as senior and corresponding author. Dr. Frank is also a co-corresponding author on this manuscript. While in our opinion, “previous work” does not imply some sort of ownership, we are happy to confirm that one of us was responsible for the work we are referencing.

      Reviewing Editor's recommendations:

      The reviewers have all provided a very constructive assessment of the work and offered many useful suggestions to improve the manuscript. I'd advise thinking carefully about how many of these can be reasonably addressed. Most will not require further experiments. I consider it essential to improve the methods to ensure others could repeat the work. This includes adding methods for the PLA and including detail about the controls for the dSTORM. The reviewers have offered suggestions about types of controls to include if these have not already been done.

      We thank the editor for their recommendations. We have revised the methods section, which now includes a paragraph on PLA as well as on CRISPR/Cas9-based generation of mutant cell lines. We have also added information on the dSTORM buffer to the manuscript. Data of controls indicating antibody specificity (using confocal microscopy) have been added to the manuscript’s supplementary material (see Fig. S2 and S3).

      I agree with the reviewers that the different isoforms of the prolactin receptor need to be considered. I think this could be done as an acknowledgment and point of discussion.

      We have revised the discussions section and have added a paragraph on the different PRLR isoforms, among others.

      For Figure 2E, make it clear in the figure (or at least in legend) that the middle line is the basal condition.

      We thank the editor for their comment. We have made changes to Fig 2E and have added a sentence to the legend making it clear that the middle depicts the basal condition.

      My biggest concern overall was the fact that this is all largely conducted in a single cell line. This was echoed by at least one of the reviewers. I wonder if you have replicated this in other breast cancer cell lines or mammary epithelial cells? I don't think this is necessary for the current manuscript but would increase confidence if available.

      We thank the editor for their comment and fully agree with their assessment. Unfortunately, we have not replicated these experiments in other BC cell lines nor mammary epithelial cells but would certainly want to do so in the near future.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the functional difference between the most commonly expressed form of PTH, and a novel point mutation in PTH identified in a patient with chronic hypocalcemia and hyperphosphatemia. The value of this mutant form of PTH as a potential anabolic agent for bone is investigated alongside PTH(1-84), which is a current anabolic therapy. The authors have achieved the aims of the study.

      Strengths:

      The work is novel, as it describes the function of a novel, naturally occurring, variant of PTH in terms of its ability to dimerise, to lead to cAMP activation, to increase serum calcium, and its pharmacological action compared to normal PTH.

      Recommendations for the authors:

      (1) In your response to the reviewers you included a figure. You said it was for the reviewers only. We are *not* including it here. Is that correct or should it be in the Public Reviews?

      We apologize for any confusion and appreciate your thorough review. The phrase “data only for reviewers” was intended to indicate that the content was included in the revision based on reviewers’ comments, not in the main text (article). However, we acknowledge that this phrasing may be inappropriate. We are agree to make the figure included in the previous author response of the public reviews. Accordingly, we propose to revise the previous author response as follows:

      - Remove "(data only for reviewers)".

      -  Correct the typo from "perosteal" to "periosteal".

      - “Thank you for your comment. First, we ensured that the bones sampled during the experiment showed no defects, and we carefully separated the femur bones from the mice to preserve their integrity. In the 3-point bending test, PTH treatment significantly increased the maximum load of the femur bone compared to the OVX-control group. Additionally, the maximum load in the PTH treatment group was significantly greater than that observed in the PTH dimer group. Furthermore, structural factors influencing bone strength, such as the periosteal perimeter and the endocortical bone perimeter, were also increased in the PTH treatment group compared to the PTH dimer group.”

      (2) Do you mean to always have R<sup>0</sup> (have a superscript) and RG (never have a superscript) or should they be shown in the same way throughout your paper?

      Thank you for your thorough review. Based on previous studies that addressed the conformation of PTH1R, R<sup>0</sup> is typically shown with a superscript, while RG is not (Hoare et al., 2001; Dean et al., 2006; Okazaki et al., 2008). We have followed this notation and will ensure consistency throughout our paper.

      Hoare, S. R., Gardella, T. J., & Usdin, T. B. (2001). Evaluating the signal transduction mechanism of the parathyroid hormone 1 receptor: effect of receptor-G-protein interaction on the ligand binding mechanism and receptor conformation. Journal of Biological Chemistry, 276(11), 7741-7753.

      Dean, T., Linglart, A., Mahon, M. J., Bastepe, M., Jüppner, H., Potts Jr, J. T., & Gardella, T. J. (2006). Mechanisms of ligand binding to the parathyroid hormone (PTH)/PTH-related protein receptor: selectivity of a modified PTH (1–15) radioligand for GαS-coupled receptor conformations. Molecular endocrinology, 20(4), 931-943.

      Okazaki, M., Ferrandon, S., Vilardaga, J. P., Bouxsein, M. L., Potts Jr, J. T., & Gardella, T. J. (2008). Prolonged signaling at the parathyroid hormone receptor by peptide ligands targeted to a specific receptor conformation. Proceedings of the National Academy of Sciences, 105(43), 16525-16530.

      (3) The following grammatical and fact changes and word changes are requested.

      We appreciate the thoughtful review and thank you for pointing out the grammatical, factual, and word changes required. We have carefully reviewed and addressed each of these corrections to ensure the paper's accuracy and readability.

      We appreciate the reviewers' detailed and constructive reviews. We have addressed all the comments to improve the quality of our paper.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We would like to sincerely thank the reviewers again for their insightful comments on the previous version of our manuscript. In the last round of review, the reviewers were mostly satisfied with our revision but raised a few suggestions and/or remaining concerns. We have further edited the manuscript to address these concerns.

      Reviewer #1:

      - An explicit, quantitative link between the RNN and fMRI data is perhaps a last point that would integrate the RNN conclusion and analyses in line with the human imaging data.

      Reviewer #2:

      - Few. While more could be perhaps done to understand the RNN-fMRI correspondence, the paper contributes a compelling set of empirical findings and interpretations that can inform future research.

      To better align the RNN and fMRI results qualitatively, we performed an additional representational similarity analysis (RSA) on the data. Specifically, we computed the representational dissimilarity matrices (RDMs) for fMRI and RNN data separately, and calculated the correlation between the RDMs to quantify the similarity between fMRI data and different RNN models. We found that, consistent with our main claims, RNN2 generally demonstrated higher similarity with the fMRI data compared to RNN1. These results provide further support that RNN2 aligns better with human neuroimaging data. We have included this result (lines 496-505) and the corresponding figure (Figure 7) in the manuscript.

      Reviewer #1:

      - As Rev 2 mentions, multiple types of information codes may be present, and the response letter Figure 5 using representational similarity (RSA) gets at this question. It would strengthen the work to, at minimum, include this analysis as an extended or supplemental figure.

      Following this suggestion, we have now included Response Letter Figure 5 from the previous round of review in the manuscript (lines 381-387 and Appendix 1 – figure 7).

      Reviewer #1:

      - To sum up the results, a possible, brief schematic of each cortical area analyzed and its contribution to information coding in WM and successful subsequent behavior may help readers take away important conclusions of the cortical circuitry involved.

      Following this suggestion, we have added a schematic figure illustrating the contribution of each cortical region in our experiment to better summarize our findings (Figure 8).

      We hope that these changes further clarify the issues and strengthen the key claims in our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      The manuscript consists of two separate but interlinked investigations: genomic epidemiology and virulence assessment of Salmonella Dublin. ST10 dominates the epidemiological landscape of S. Dublin, while ST74 was uncommonly isolated. Detailed genomic epidemiology of ST10 unfolded the evolutionary history of this common genotype, highlighting clonal expansions linked to each distinct geography. Notably, North American ST10 was associated with more antimicrobial resistance compared to others. The authors also performed long-read sequencing on a subset of isolates (ST10 and ST74) and uncovered a novel recombinant virulence plasmid in ST10 (IncX1/IncFII/IncN). Separately, the authors performed cell invasion and cytotoxicity assays on the two S. Dublin genotypes, showing differential responses between the two STs. ST74 replicates better intracellularly in macrophages compared to ST10, but both STs induced comparable cytotoxicity levels.

      Comparative genomic analyses between the two genotypes showed certain genetic content unique to each genotype, but no further analyses were conducted to investigate which genetic factors were likely associated with the observed differences. The study provides a comprehensive and novel understanding of the evolution and adaptation of two S. Dublin genotypes, which can inform public health measures. 

      The methodology included in both approaches was sound and written in sufficient detail, and data analysis was performed with rigour. Source data were fully presented and accessible to readers. Certain aspects of the manuscript could be clarified and extended to improve the manuscript. 

      (1) For epidemiology purposes, it is not clear which human diseases were associated with the genomes included in this manuscript. This is important since S. Dublin can cause invasive bloodstream infections in humans. While such information may be unavailable for public sequences, this should be detailed for the 53 isolates sequenced for this study, especially for isolates selected to perform experiments in vitro.

      Thank you for the suggestion. We have added the sample type for the 53 isolates sequenced for this study. These additional details have been added to Supplementary Tables 1, 4, 9 and 10.

      (2) The major AMR plasmid in described S. Dublin was the IncC associated with clonal expansion in North America. While this plasmid is not found in the Australian isolates sequenced in this study, the reviewer finds that it is still important to include its characterization, since it carries blaCMY-2 and was sustainedly inherited in ST10 clade 5. If the plasmid structure is already published, the authors should include the accession number in the Main Results.

      We have provided accessions and context for two of the IncC hybrid plasmids that have been previously reported in the literature in the Introduction. The text now reads:

      “These MDR S. Dublin isolates all type as sequence type 10 (ST10), and the AMR determinants have been demonstrated to be carried on an IncC plasmid that has recombined with a virulence plasmid encoding the spvRABCD operon (12,16,18,19).  This has resulted in hybrid virulence and AMR plasmids circulating in North America including a 329kb megaplasmid with IncX1, IncFIA, IncFIB, and IncFII replicons (isolate CVM22429, NCBI accession CP032397.1) (12,16) and a smaller hybrid plasmid 172,265 bases in size with an IncX1 replicon (isolate N13-01125, NCBI accession KX815983.1) (19).”

      Further characterisation of the IncA/C plasmid circulating in North America was beyond the scope of this study.

      (a) The reviewer is concerned that the multiple annotations missing in  plasmid structures in Supplementary Figures 5 & 6, and  genetic content unique to ST10 and ST74 was due to insufficient annotation by Prokka. I would recommend the authors use another annotation tool, such as Bakta (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8743544/) for plasmid annotation, and reconstruction of the pangenome described in Supplementary Figure 10. Since the recombinant virulence plasmid in ST10 is a novel one, I would recommend putting Supplementary Figure 5 as a main figure, with better annotations to show the virulence region, plasmid maintenance/replication, and possible conjugation cluster.

      In the supplementary figures of the plasmids, we sought to highlight key traits on interest on the plasmids, namely plasmid replicons, antimicrobial resistance and heavy metal resistance (Supplementary Figure 5) and virulence genes (Supplementary Figure 6). The inclusion of the accessions of publicly available isolates provide for characterised plasmids such as the S. Dublin virulence plasmid (NCBI accession: CP001143). 

      For the potentially hybrid plasmid with IncN/IncX1/IncFII reported in Supplementary Figure 6, we have undertaken additional analyses of the two Australian isolates to reannotate these isolates with Bakta which provides for more detailed annotations. 

      We have added new text to the methods which reads as: 

      “The final genome assemblies were confirmed as S. Dublin using SISTR and annotated using both Prokka v1.14.6 (69) for consistency with the draft genome assemblies and  Bakta v1.10.1 (93) which provides for more detailed annotations (Supplementary Table 13). Both Prokka and Bakta annotations were in agreement for AMR, HMR and virulence genes, with Bakta annotating between 3-7 additional CDS which were largely ‘hypothetical protein’.”

      For the pangenome analysis of the seven ST74 and ten ST10 isolates, we have continued to use the Prokka annotated draft genome assemblies for input to Panaroo. 

      (4) The authors are lauded for the use of multiple strains of ST10 and ST74 in the in vitro experiment. While results for ST74 were more consistent, readouts from ST10 were more heterogenous (Figure 5, 6). This is interesting as the tested ST10 were mostly clade 1, so ST10 was, as expected, of lower genetic diversity compared to tested ST74 (partly shown in Figure 1D. Could the authors confirm this by constructing an SNP table separately for tested ST10 and ST74? Additionally, the tested ST10 did not represent the phylogenetic diversity of the global epidemiology, and this limitation should be reflected in the Discussion.

      In response to the reviewer’s comments, we have provided a detailed SNP table (Supplementary Table 12) to further clarify the genetic diversity within the tested ST10 and ST74 strains. 

      Additionally, we have expanded on the limitation regarding the phylogenetic diversity of the ST10 isolates in the Discussion, highlighting how the strains used in the in vitro experiments may not fully represent the global epidemiological diversity of S. Dublin ST10. The new text now reads:

      “This study has limitations, including a focus on ST10 isolates from clade 1, which do not represent global phylogenetic diversity. Nonetheless, our pangenome analysis identified >900 uncharacterised genes unique to ST74, offering potential targets for future research. Another limitation is the geographic bias in available genomes, with underrepresentation from Asia and South America. This reflects broader disparities in genomic research resources but may improve as public health genomics capacity expands globally.”

      (5) The comparative genomics between ST10 and ST74 can be further improved to allow more interpretation of the experiments. Why were only SPI-1, 2, 6, and 19 included in the search for virulome, how about other SPIs? ST74 lacks SPI-19 and has truncated SPI-6, so what would explain the larger genome size of ST74? Have the authors screened for other SPIs using more well-annotated databases or references (S. Typhi CT18 or S. Typhimurium ST313)? The mismatching between in silico prediction of invasiveness and phenotypes also warrants a brief discussion, perhaps linked to bigger ST74 genome size (as intracellular lifestyle is usually linked with genome degradation).

      Systematic screening for SPIs with detailed reporting on individual genes and known effectors is still an area of development in Salmonella comparative genomics. In our characterisation of the virulome in this S. Dublin dataset we decided to focus on SPI1, SPI-2, SPI-6 and SPI-19 as these had been identified in previous studies and were considered to be most likely linked to the invasive phenotype of S. Dublin. We thought the truncation of SPI-6 and lack of SPI-19 in ST74 compared to the ST10 isolates would provide a basis to explore genomic differences in the two genotypes, with the screening for individual genes on each SPIs reported in Supplementary Figure 7 and Supplementary Table 9.  

      We have expanded upon the mismatching of the in silico prediction of invasiveness and phenotypes in the Discussion. We now explore the increased genome size and intracellular replication of the ST74 population. We hypothesise that invasiveness has not been studied as thoroughly in zoonotic iNTS as much as human adapted iNTS and S. Typhi, and the increased genome content may be required for survival in different host species. The new text now reads:

      “Our phenotypic data demonstrated a striking difference in replication dynamics between ST10 and ST74 populations in human macrophages. ST74 isolates replicated significantly over 24 hours, whereas ST10 isolates were rapidly cleared after 9 hours of infection. ST74 induced significantly less host cell death during the early-mid stage of macrophage infection, supported by limited processing and release of IL-1ß at 9 hpi. While NTS are generally potent inflammasome activators (60), most supporting data come from laboratory-adapted S. Typhimurium strains. Our findings suggest that ST74 isolates may employ immune evasion mechanisms to avoid host recognition and activation of cell death signaling in early infection stages. Similar trends have been observed with S. Typhimurium ST313, which induces less inflammasome activation than ST19 during murine macrophage infection (61). This could facilitate increased replication and dissemination at later stages of infection. Consistent with this, we observed comparable cytotoxicity between ST10 and ST74 isolates at 24 hpi, suggesting ST74 induces cell death via alternative mechanisms once intracellular bacterial numbers are unsustainable. Further research is needed to identify genomic factors underpinning these observations.”

      (6) On the epidemiology scale, ST10 is more successful, perhaps due to its ongoing adaptation to replication inside GI epithelial cells, favouring shedding. ST74 may tend to cause more invasive disease and less transmission via fecal shedding. The presence of T6SS in ST10 also can benefit its competition with other gut commensals, overcoming gut colonization resistance. The reviewer thinks that these details should be more clearly rephrased in the Discussion, as the results highly suggested different adaptations of two genotypes of the same serovar, leading to different epidemiological success.

      We thank the reviewer for highlighting that we could rephrase this important point. We have added additional text in the Discussion to better interpret the differences in the two genotypes of S. Dublin and how this relates to difference epidemiological success. The new text now reads:

      “While machine learning predicted lower invasiveness for ST74 compared to ST10, the increased genomic content of ST74 may support higher replication in macrophages. We speculate that increased intracellular replication could enhance systemic dissemination, though this requires in vivo validation. Invasiveness of S. enterica is often linked to genome degradation (4,62–64). However, this is mostly based on studies of human-adapted iNTS (ST313) and S. Typhi, leaving open the possibility that the additional genomic content of ST74 supports survival in diverse host species. An uncharacterised virulence factor may underlie this replication advantage. Collectively, these findings highlight phenotypic differences between S. Dublin populations ST10 and ST74. Enhanced intra-macrophage survival of ST74 could promote invasive disease, whereas the prevalence of ST10 may relate to better intestinal adaptation and enhanced faecal shedding. In vivo models are needed to test this hypothesis. Interestingly, the absence of SPI-19 in ST74, which encodes a T6SS, may reflect adaptation to enhanced replication in macrophages. SPI-19 has been linked to intestinal colonisation in poultry (23,56) and mucosal virulence in mice (56). It’s possible that the efficient replication of ST74 in macrophages might compensate for the absence of SPI-19, relying instead on phagocyte uptake via M cells or dendritic cells. The larger pangenome of ST74 compared to ST10 could further enhance survival within hosts. These findings highlight important knowledge gaps in zoonotic NTS host-pathogen interactions and drivers of emerging invasive NTS lineages with broad host ranges.”

      Reviewer #2 (Public review): 

      This is a comprehensive analysis of Salmonella Dublin genomes that offers insights into the global spread of this pathogen and region-specific traits that are important to understanding its evolution. The phenotyping of isolates of ST10 and ST74 also offers insights into the variability that can be seen in S. Dublin, which is also seen in other Salmonella serovars, and reminds the field that it is important to look beyond lab-adapted strains to truly understand these pathogens. This is a valuable contribution to the field. The only limitation, which the authors also acknowledge, is the bias towards S. Dublin genomes from high-income settings. However, there is no selection bias; this is simply a consequence of publically available sequences.

      Reviewer #1 (Recommendations for the authors): 

      (1) The Abstract did not summarize the main findings of the study. The authors should rewrite to highlight the key findings in genomic epidemiology (low AMR generally, novel plasmid of which Inc type, etc.) and the in vitro experiments. The findings clearly illustrate the differing adaptations of the two genotypes. Suggest to omit 'economic burden' and 'livestock' as this study did not specifically address them.

      We agree with the Reviewer and have re-written the abstract to directly reflect the major outcomes of the research. We have also deleted wording such as ‘livestock’, ‘economic burden’ and ‘One Health’ as we did not specifically address these issues as highlighted by the Reviewer. 

      (2) Figure 2: The MCC tree should include posterior support in major internal nodes. The current colour scheme is also confusing to readers (columns 1, 2). Suggest to revise and include additional key information as columns: major AMR genes (blaCMY-2, strAB, floR) and mer locus, so this info can be visualized in the main figure. 

      Thank you for your valuable feedback. We have revised Figure 2 with the MCC tree to include posterior support on the internal nodes. We have also amended the figure legend to explain the additional coloured internal nodes. We have also amended the heatmap in Figure 2 to include additional white space between the columns to make it easier for the readers to distinguish. We didn’t change the colours in this figure as we have used the same colours throughout for the different traits reported in this study. Further, we chose to keep the AMR profiles reported in Figure 2 at the susceptible, resistant or MDR. This was done to convey the overview of the AMR profiles, and we provide detail in the AMR and HMR determinants in the Supplementary Figures and Tables. 

      (3) The manuscript title is not informative, as it did not study the 'dynamics' of the two genotypes. Suggest to revise the study title along the lines of main results.

      Thank you for the feedback on the title. We have amended this to better reflect the main findings of the study, and it now reads as “Distinct adaptation and epidemiological success of different genotypes within Salmonella enterica serovar Dublin”

      (4) The co-occurrence of AMR and heavy metal resistance genes (like mer) are quite common in Salmonella and E. coli. This is not a novel finding. The reviewer would suggest shortening the details related to heavy metal resistance in Results and Discussion, to make the writing more streamlined. 

      In line with the Reviewer comments, we have shortened the details in the Results and Discussion on the co-occurrence of AMR and HMR.  

      (5) L185: missing info after n=82. 

      This has been revised to now read as “n=82 from Canada”. 

      (6) I think Vi refers to the capsular antigen, not flagelle. Please double-check this.

      Thank you for highlighting this mistake. We have revised all instances.

      (7) L252-253: which statistic was used to state 'no association'. Also, there is no evidence presented to support 'no fitness cost associated with resistance and virulence."

      We have removed this sentence.

      (8) 320: Figure 6F is a scatterplot, not PCA. Please confirm. 

      The reviewer is correct, this is in fact a scatterplot. We have amended the figure legend and text.

      (9) For Discussion, it would be helpful to compare the phenotype findings with that of other invasive Salmonella like Typhi or Typhimurium ST313.

      Thank you for noting this, we had alluded to findings from ST313 but have now expanded include some further comparisons to S. Typhimurium ST313 and added references for these within the Discussion. The additional text now reads:

      “Similar trends have been observed with S. Typhimurium ST313, which induces less inflammasome activation than ST19 during murine macrophage infection (61). This could facilitate increased replication and dissemination at later stages of infection.”

      "Invasiveness of S. enterica is often linked to genome degradation (4,62–64).

      However, this is mostly based on studies of human-adapted iNTS (ST313) and S. Typhi, leaving open the possibility that the additional genomic content of ST74 supports survival in diverse host species. An uncharacterised virulence factor may underlie this replication advantage.”

      (10) L440: no evidence for "successful colonization" of ST74. Actually, the findings suggested otherwise.

      Thank you for picking this up, we have amended the sentence to better reflect the findings. The amended text now reads as:

      “It’s possible that the efficient replication of ST74 in macrophages might compensate for the absence of SPI-19, relying instead on phagocyte uptake via M cells or dendritic cells. The larger pangenome of ST74 compared to ST10 could further enhance survival within hosts.”

      (11) L460-461: The data did not show an increasing trend of iNTS related to S. Dublin.

      Thank you for identifying this. This sentence has been revised accordingly and now reads as:

      “While the data did not indicate an increasing trend of iNTS associated with S. Dublin, the potential public health risk of this pathogen suggests it may still warrant considering it a notifiable disease, similar to typhoid and paratyphoid fever.”

      (12) L465: Data were not analyzed explicitly in the context of animal vs. human. Suggest omitting 'One Health' from the conclusion.

      Thank you for the suggestion. We have omitted “One Health” from the conclusion

      (13) L500: Was the alignment not checked for recombination using Gubbins? The approach here is inconsistent with the method described in the subtree selected for BEAST analysis (L546).

      We have now applied Gubbins to the phylogenetic tree constructed using IQTREE, and the methods and results have been updated accordingly.

      (14) What was the output of Tempest? Correlation or R2 value? 

      We have now included the R2 value from Tempest and reported this in the manuscript. 

      (15) L556: marginal likelihood to allow evaluation of the best-fit model. Please rephrase to state this clearly.

      We have rephrased this in the manuscript to state this clearly.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The main observation that the sperm from CRISP proteins 1 and 3 KO lines are postfertilization less developmentally competent is convincing. However, the molecular characterization of the mechanism that leads to these defects and the temporal appearance of the defects requires additional studies.

      We thank the reviewer for the valuable comments. As requested, additional experiments were carried out to analyze both the molecular mechanisms and the temporal appearance of the observed defects. Our results showed that DNA integrity defects appear during epididymal maturation and/or storage (see Figure 5B), that the epididymal fluid contributes to sperm DNA fragmentation defects (See Figure 6A) and that these defects seem not to be due to an increase in oxidative stress (Figure 5C) but rather to a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis (Figure 6A,B).

      Strengths:

      The generation of these double mutant mice is valuable for the field. Moreover, the fact that the double mutant line of Crisp 1 and 3 is phenotypically different from the Crisp 1 and 4 line suggests different functions of these epididymis proteins. The methods used to demonstrate that developmental defects are largely due to post-fertilization defects are also a considerable strength. The initial characterization of these sperm has altered intracellular Ca<sup>2+</sup> levels, and increased rates of DNA fragmentation are valuable.

      We thank the reviewer for the positive comments on our work.

      Weaknesses:

      The study is mechanistically incomplete because there is no direct demonstration that the absence of these proteins alters the epididymal environment and fluid, wherein during the passage through the epididymis the sperm become affected. Also, a direct demonstration of how the proteins in question cause or lead to DNA damage and increased Ca<sup>2+</sup> requires further characterization.

      The new experiments included in the revised version (see Figure 6A) showed that exposure of control WT sperm to epididymal fluid form mutant mice leads to an increase in sperm DNA fragmentation levels, confirming that the absence of CRISP1 and CRISP3 alters the epididymal fluid wherein the sperm become affected. In addition, new observations showing that WT sperm exposed to WT epididymal fluid in the presence of Ca<sup>2+</sup> also exhibit higher DNA fragmentation levels (Figure 6A) together with the finding that mutant sperm exhibit higher intracellular Ca<sup>2+</sup> levels (Figure 6B) but no higher levels of ROS, strongly support a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis and sperm as the main responsible for DNA integrity defects.

      Reviewer #2 (Public Review):

      The authors showed that CRISP1 and CRISP3, secreted proteins in the epididymis, are required for early embryogenesis after fertilization through DNA integrity in cauda epididymal sperm. This paper is the first report showing that the epididymal proteins are required for embryogenesis after fertilization. However, some data in this paper (Table 1 and Figure 2A) are overlapped in a published paper (Curci et al., FASEB J, 34,15718-15733, 2020; PMID: 33037689). Furthermore, the authors did not address why the disruption of CRISP1/3 leads to these phenomena (the increased level of the intracellular Ca<sup>2+</sup> level and impaired DNA integrity in sperm) with direct evidence. Therefore, if the authors can address the following comments to improve the paper's novelty and clarification, this paper may be worthwhile to readers.

      We thank the reviewer for the constructive comments. Regarding the data included in Table 1 and Figure 2A, it is important to note that Table 1 includes data on embryo development corresponding to C1/C4 DKO mice not published before in which the data on embryo development corresponding to C1/C3 DKO was used as simultaneous control. Figure 2A showed in vivo fertilization results at short times after mating (4h instead of 18 h) that have been neither reported before.

      Regarding studies to address why the disruption of CRISP1 and CRISP3 leads to defects in DNA integrity and Ca<sup>2+</sup> levels, we have carried out new experiments showing that mutant sperm do not exhibit higher levels of ROS (see Figure 5C), not favoring oxidative stress as the mechanism underlying mutant sperm defects. In addition, we found that DNA integrity defects develop during epididymal transit (Figure 5B) and that exposure of WT sperm to epididymal fluid from mutant mice leads to an increase in sperm DNA fragmentation levels (Figure 6A), confirming that the absence of CRISP1 and CRISP3 alters the epididymal fluid. Finally, our new results showing that WT sperm exposed to WT epididymal fluid in the presence of Ca<sup>2+</sup> also exhibit higher DNA fragmentation levels (Figure 6A) together with the higher intracellular Ca<sup>2+</sup> levels detected in mutant sperm (Figure 6B) strongly support a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis and sperm as the main responsible for DNA integrity defects.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall comments:

      This manuscript investigates the mechanisms whereby the absence of the epididymal CRISP proteins 1 and 3 (Cysteine-Rich Secretory Proteins) causes infertility and lower embryo developmental rates. This strain's infertility seems to have a post-fertilization origin because the rates of in vivo fertilization are like the controls, but the development to the blastocyst stage is decreased. The results of this study show that (1) mutant sperm viability, progressive motility, and morphology are normal;

      (2) in vivo fertilization rates are comparable to controls, but embryo development is reduced;

      (3) in vitro fertilization studies found reduced fertilization rates and activation rates even in zona-free studies;

      (4) additional functional studies showed increased rates of DNA fragmentation and elevated Ca<sup>2+</sup> levels in mutant sperm.

      The results presented are credible and hint that the epididymis might play a role before and after fertilization and directly affect embryo development. However, the study is mechanistically incomplete, as there is no direct demonstration that the absence of these proteins alters the epididymal environment and fluid, wherein the passage through the epididymis the sperm become functionally defective, and whether mutant or control epididymal fluid or purified CRISP proteins can change, either reduce or overcome, respectively, the developmental competence of the control or mutant sperm and induce functional changes in the counterpart sperm. In summary, the main observation that the sperm from CRISP proteins 1 and 3 KO lines are post-fertilization less developmentally competent is significant and important, but the molecular characterization of the defects and the temporal appearance of defects requires additional studies.

      Specific comments:

      (1) Introduction.

      It is too long. The description of the function of the epididymis should be reduced. The functional properties of the Crisp genes should also be substantially shortened.

      As requested, the Introduction has been revised and descriptions of the epididymis and CRISP have been shortened

      (2) Results.

      • Lines 140 to 142. Remove these initial lines. Start directly addressing the results of the C1/C3 strain, which is the mutant under consideration here. Referring to the C1/C4 results detracts from the focus of the study.

      As suggested by the reviewer, lines 140 to 142 have been removed.

      • Table 1. Move the two-cell embryo line to the top of the Table and place the Blastocyst line below it. This organization is the conventional method to present this type of data.

      As suggested, the order of the lines in Table 1 has been modified to align with the conventional presentation method.

      • Figures 1 and 2A and B data are solid and support the notion that enough sperm reach the site of fertilization, and that the sperm are defective in their capacity to support embryo development. Figures 2C and D have interesting data, although additional information would strengthen these results. The authors concluded that the sperm were defective in the epididymis. Where in the epididymis? These sperm were all from the cauda. Could the authors collect sperm from the upper portion of the cauda, or midportion, and compare if the defects manifest gradually?

      We appreciate this interesting and appropriate comment from the reviewer. In this regard, all the studies in our work were carried out using sperm from the whole cauda epididymis, the reason why we could not answer where defective sperm appear in the epididymis. In view of this, we have now conducted a comparative DNA fragmentation analysis between caput and cauda sperm from both genotypes. Our findings indicate that while cauda mutant sperm showed once again higher DNA fragmentation levels than controls, caput sperm exhibited levels of DNA damage not significantly different between genotypes. These results confirm that defects in DNA appear following sperm passage through the epididymal caput, supporting the hypothesis that defects in DNA fragmentation manifest during sperm transit through the epididymis and /or during storage in the cauda. These results have been included in the revised version of the manuscript (see lines 235-240/Figure 5B of the revised version)

      • Figure 3 displays the results of in vitro fertilization, either COCs A-C or zona-free fertilization D-F. The results are important and differ from those produced by fertilization in vivo. The authors indicate that these confirm that the in vivo conditions overcome in vitro defects. However, this study never addresses the reason behind it. Is there less expression of proteins related to these functions, or the function of some proteins is compromised? The authors should advance a hypothesis or a rationale to explain these results.

      As indicated by the reviewer, our results showed differences between the fertilization rates observed for mutant mice under in vivo and in vitro conditions, as previously observed for all our single and multiple KO models (Da Ros et al., 2008; PMID: 18571638, Brukman et al., 2016; PMID: 26786179, Weigel Muñoz, 2018; PMID: 29481619, Ernesto et al., 2015; PMID: 26416967, Carvajal et al,. 2018; PMID: 30510210) and also reported by other groups (Okabe et al., 2007; PMID: 17558467). In this regard, it has been well established that, although millions of sperm are ejaculated into the female tract, only a few (approximately one per oocyte) reach the fertilization site (i.e. the ampulla) (Cummins and Yanagimachi, 1982; doi:10.1002/mrd.1120050304). This efficient selection system by the female reproductive tract leads to the arrival of only the best sperm at the fertilization site, even in males with reproductive deficiencies, thereby “masking” sperm defects that can be detected under in vitro conditions due to the competition between good and bad quality sperm for the egg. Thus, although we can not exclude other mechanisms to explain the commonly observed differences between in vivo and in vitro fertilization rates, our rationale is that the natural and efficient sperm selection process that takes place within the female reproductive tract masks sperm defects that can, otherwise, be detected under the competitive in vitro conditions. This explanation is now included in the discussion of the revised version of the manuscript (see lines 320-325).

      • Data in Figures 4 and 5 support the interpretation of the authors. However, it is necessary to establish the level of oxidative stress in the mutant sperm vs. the controls. Also, a question to explore is for how long does the sperm need to reside in that mutant environment to start undergoing the DNA fragmentation reported?

      In response to the valuable request from the reviewer regarding the level of oxidative stress in sperm, we have analyzed reactive oxygen species (ROS) levels in mutant and control epididymal sperm. Our results showed that ROS levels in mutant sperm were not higher than those observed in the control group, supporting the idea that mechanisms other than oxidative stress may be leading to the increased DNA fragmentation observed in mutant sperm. These results are now included in the revised version of the manuscript (see Figure 5C).

      Regarding the question on how long the sperm need to reside in the mutant environment to undergo DNA fragmentation, recent experiments carried out in response to this reviewer in which we analyzed DNA fragmentation in caput sperm led us to conclude that DNA fragmentation develops during epididymal transit and/or storage in the cauda. While these observations do not precisely define the time within the epididymis that sperm require for exhibiting DNA fragmentation, our additional new in vitro experiments analyzing the effect of epididymal fluids on sperm DNA integrity showed that exposure of WT sperm to DKO fluid for only 1 hr already leads to an increase in DNA fragmentation (see Figure 6A of the revised manuscript), suggesting that sperm do not need long periods within the mutant environment to be affected.

      (3) The length of the Discussion section should be shortened, especially by not recapitulating data presented in the Results section.

      As requested by the reviewer, sections recapitulating results have been modified.

      Minor comments:

      (1) The sentence in lines 171 and 172 is unclear, "However, despite the short time after mating, once again, the in vivo fertilized eggs corresponding to the mutant group exhibited clear defects to reach the blastocyst stage in vitro compared to controls." What do the authors mean by short time? It is the expected time, correct?

      It is well established that after copulatory plug formation, most oocytes are fertilized within 2 to 8 hours, with fertilization rates that increase over time: 0–5% at 1.5 hours post-mating; 40% at 4 hours post-mating and more than 90% at 7 hs after mating (Muro et al., 2016; PMID: 26962112, La Spina et al., 2016; PMID: 26872876). In order to examine whether the embryo development defects observed for mutant mice were due to a delayed arrival of sperm to the ampulla, we decided to analyze the percentage of fertilized eggs recovered from the ampulla at “short times” (4 hs) after mating to avoid the possibility that the prolonged stay of sperm within the female tract corresponding to the usual “overnight mating” schedule could be giving defective sperm enough time to reach the ampulla and, finally, fertilize the eggs (i.e. delayed fertilization). Our results showed that, despite the expected lower fertilization rates observed for both control and mutant males when analyzed just 4 hs after mating, the fertilized eggs corresponding to the mutant group were still exhibiting clear defects to develop into blastocysts compared to controls, not favoring the idea that embryo development defects were due to a delayed fertilization. The sentence in lines “171 and 172” has been modified in the revised version of the manuscript to better explain this conclusion (see lines 152-155 of the revised version).

      (2) Line 177. Mutant epididymal sperm already carry defects leading to embryo development failure. Under this subheading, the authors compare within the same female the ability of mutant and control sperm delivered into different horns to support fertilization and embryo development. They show that the embryo development induced by mutant sperm is diminished vs. controls under very similar conditions, confirming the previous results of post-fertilization failure. The data also answers the question raised by the authors of whether the fertilization defects appear during or after epididymal transit; the interpretation of the results is the functional defects in the sperm are present before the transport into the female tract. Important unaddressed questions are, could these defects begin even earlier before arriving at the cauda? Did the authors try to incubate the mutant sperm with the epididymal fluid of WT mice to examine if the sperm defects could be rescued? The opposite experiment could also be performed, where WT sperm are incubated with the epididymal fluid of mutant mice, and the treated sperm examined for altered Ca<sup>2+</sup> levels or DNA fragmentation.

      First of all, we would like to clarify that our question about whether the fertilization defects appear “during or after epididymal transit” was in fact referring to whether defects appear during epididymal maturation or later on, at the moment of ejaculation. In this regard, our in vivo and in vitro fertilization studies allowed us to conclude that defects were already present in epididymal sperm without excluding the possibility that additional defects could appear at the vas deferens or at the moment of ejaculation due to the contribution of seminal plasma secretions.

      Regarding whether sperm defects could appear even earlier before arriving to the cauda, we have now analyzed DNA fragmentation defects in caput vs cauda both mutant and control sperm observing differences between genotypes only for cauda sperm. Based on these observations, we conclude that DNA integrity defects appear within the epididymis after sperm passage through the caput either when sperm reach the corpus or the cauda epididymis, or during their storage within the cauda region.

      Also, as suggested by the reviewer, we incubated in vitro WT sperm with epididymal fluid from DKO mice (and vice versa) and then analyzed DNA fragmentation levels. Results showed that exposure of control sperm to the mutant epididymal fluid for 1 hr significantly increased DNA fragmentation levels. When mutant sperm (exhibiting higher levels of DNA fragmentation than control sperm), were exposed to epididymal fluid from WT mice, no differences between groups were observed. Together, these results confirm both that the epididymal fluid from mutant mice contributes to the higher DNA fragmentation levels detected in mutant sperm, and that normal epididymal fluid would not be able to rescue the DNA fragmentation present in mutant cells. These results are now included in the revised version of the manuscript (see Figure 6A).

      (3) Lines 203 to 216. In these paragraphs the authors indicate "that mutant sperm had a lower percentage of fertilization and lower rates of blastocysts (Figure 3D, E), indicating that defects in egg coat penetration were not responsible for embryo development failure. Later, they indicated that a few eggs fertilized by mutant sperm failed to activate. It is shown that Ca<sup>2+</sup> oscillations are normal, indicating that the defects lie elsewhere. Could the authors propose a mechanism based on their sperm DNA defects?

      As described in the Result and Discussion sections of the original manuscript, we decided to investigate the existence of possible defects in sperm DNA fragmentation based on evidence indicating that delays in early embryo development may result from the time taken by the egg to repair damaged paternal DNA (Esbert et al., 2018; PMID: 30259705, Newman et al., 2022; PMID: 34954800, Nguyen et al., 2023; PMID: 37658763). In this regard, it is known that time is needed before the first embryonic cell division for activation of the egg DNA repairing machinery (Martin et al., 2019; PMID: 30541031, Newman et al., 2022; PMID: 34954800) and that increased sperm DNA damage may necessitate more time for repair by the oocyte (Martin et al., 2019; PMID: 30541031, Newman et al., 2022; PMID: 34954800). Based on this, we decided to examine possible DNA damage in sperm. Our finding that, in fact, sperm DNA fragmentation was clearly increased in mutant sperm led us to propose that delays in early embryo development in our mutant colonies may result from the time required by the egg to repair sperm DNA fragmentation.

      (4) The demonstration that C1/C3 sperm have abnormal rates of DNA fragmentation and Ca<sup>2+</sup> levels is significant. Additional studies would strengthen the findings reported here. For example, what are the levels of oxidative stress in these sperm? Are there other changes related to oxidative stress? Performing a TUNNEL assay will strengthen the notion of DNA damage demonstrated here with the chromatin dispersion assay.

      As mentioned previously, we analyzed oxidative stress by evaluating ROS levels in control and mutant sperm observing no differences between genotypes. These results have been included in the revised version of the manuscript (See Figure 5C). We appreciate the suggestion of performing TUNNEL assay for future studies.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) There are some reports small RNAs gained during the epididymal transition of sperm are essential for embryonic development (e.g., Conine et al., Dev Cell, 46, 470480, 2018; PMID: 30057276), suggesting that the luminal changes in Crisp1/3 double KO (dKO) epididymis lead to the phenotype in this study. In fact, there is no evidence whether CRISP1/CRISP3 secreted from an epididymis exists in cauda epididymal sperm and directly controls the observed phenomena. Also, the authors wrote there is no strong evidence to exclude the possible role of small RNA in Crisp1/3 dKO sperm (lines 370-372). Therefore, it is at least necessary to measure small RNA abundance in dKO mice.

      As mentioned by the reviewer and as cited in our manuscript, there is a report indicating that the small RNAs gained during epididymal transit may play a role in embryonic development (Conine et al., 2018; PMID: 30057276). However, the need of small RNAs for embryonic development still remains a topic of debate (Wang et al. 2020; PMCID: PMC7799177). In this regard, clear evidence indicating that sperm DNA fragmentation is associated with embryo development defects together with the increase in sperm DNA fragmentation levels observed in mutant sperm support sperm DNA damage as one of the causes leading to the observed phenotype in our mutant mice. Moreover, recent experiments carried out in response to Reviewer 1 comments revealed that exposure of control sperm to epididymal fluid from mutant mice significantly increases DNA fragmentation levels, confirming that the absence of CRISP1 and CRISP3 proteins in epididymal fluid contributes to sperm DNA damage in mutant sperm. Finally, whereas oxidative stress might also lead to embryo development impairment as mentioned in our original manuscript, recent evaluation of ROS levels in control and mutant sperm carried out in response to Reviewer 1’s comments did not show higher ROS levels in mutant sperm. Thus, although as mentioned in the manuscript, we do not exclude the possibility that small RNAs may also contribute to embryo development defects, our observations support DNA fragmentation and a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis and sperm as the main responsible for embryo development failure in our mutant males. The experiments using epididymal fluid (Figure 6A) and those evaluating ROS levels (Figure 5C) have been included in the revised version of the manuscript and discussed accordingly.

      (2) Lines 245-248 and 354-374: According to Figure 5C, the intracellular Ca<sup>2+</sup> level significantly increased in Crisp1/3 dKO sperm compared to control. The author hypothesized that this increase could destroy sperm DNA integrity, causing defects in early embryogenesis. However, the authors did not show the direct evidence.

      Specifically, as CRISP1 inhibits CatSper (line 95), the authors believed the increased Ca<sup>2+</sup> level in Crisp1/3 dKO sperm was observed. Crisp1/3 dKO and Crisp1/4 dKO mice share the disruption of Crisp1, but the phenotype is totally different. Thus, the authors should also examine the CatSper activity in Crisp1/3 dKO sperm.

      We appreciate the reviewer's insightful comments. In this regard, whereas C1/C3 and C1/C4 DKO colonies shares the disruption of Crisp1, the intracellular Ca<sup>2+</sup> levels in these two colonies are different as no increase in sperm intracellular Ca<sup>2+</sup> was detected in Crisp C1/C4 DKO mice. Thus, this difference in intracellular Ca<sup>2+</sup> levels might explain the different embryo development phenotype observed in our two DKO colonies. In this regard, our results revealed that sperm intracellular Ca<sup>2+</sup> levels are different depending on the Crisp gene being deleted. Whereas the lack of Crisp1 did not affect intracellular sperm Ca<sup>2+</sup> levels (Weigel Munoz et al, 2018; PMID: 29481619), there was an increase in Ca<sup>2+</sup> levels in CRISP2 KO sperm (Brukman et al., 2016; PMID: 26786179) and a decrease in sperm when Crisp4 was deleted (Carvajal 2019, Ph.D Thesis). Thus, although the ability of CRISP3 to regulate sperm Ca<sup>2+</sup> channels has not yet been reported, the existence of functional compensations between homologous CRISP members (Curci et al., 2020; PMID: 33037689) makes it complicated to draw straightforward conclusions based on the behavior of each individual protein in Ca<sup>2+</sup> regulation. In fact, while the lack of CRISP1 and CRISP4 does not affect sperm Ca<sup>2+</sup> concentration (Carvajal 2019, Ph.D Thesis), the simultaneous lack of CRISP1 and CRISP3 produced an increase in Ca<sup>2+</sup> levels and the lack of the four CRISP proteins showed a decrease in the intracellular levels of the cation after capacitation (Curci et al, 2020). Based on these observations, we conclude that the absence of CRISP1 may or may not lead to altered intracellular Ca<sup>2+</sup> levels depending on the other simultaneously-deleted gene/s.

      The authors make a hypothesis that the increased Ca<sup>2+</sup> level may lead to damaged DNA integrity by citing a published paper (lines 360-363). In the published paper, the authors examined the influence of the luminal fluid of the epididymis and vas deference on sperm chromatin fragmentation (Gawecka et al., 2015). However, they did not mention the increased DNA fragmentation in epididymal sperm when these sperm were incubated with Ca<sup>2+</sup> or Mn2+. So, the authors' hypothesis is over discussion. Thus, the correlation between the intracellular Ca<sup>2+</sup> level and DNA integrity in sperm is still unclear. So, the authors should show why the increased Ca<sup>2+</sup> level leads to DNA fragmentation with direct evidence.

      We appreciate the reviewer’s comment regarding the work by Gawecka et al., (2015), and the opportunity to clarify the proposed mechanism underlying our observations. In the above mentioned paper, the authors reported that when mouse epididymal or vas deferens sperm were incubated with divalent cations (Ca<sup>2+</sup> and Mn<sup>2+</sup>) in the presence of luminal fluid, they were induced to degrade their DNA in a process termed sperm chromatin fragmentation (SCF). The fact that both the ejaculated and epididymal mutant sperm used in our studies had been exposed to epididymal fluid lacking CRISP proteins known to regulate sperm Ca<sup>2+</sup> channels, opened the possibility that changes in Ca<sup>2+</sup> levels within the epididymal fluid and/or sperm could be responsible for the higher DNA fragmentation levels observed in mutant cells. In this regard, it is important to note that, as requested by Reviewer 1, we performed additional in vitro experiments in which WT epididymal sperm were exposed to mutant or WT epididymal fluid in the presence or absence of Ca<sup>2+</sup> and DNA fragmentation analyzed at the end of incubation. Results showed a significant increase in DNA fragmentation in WT sperm exposed to either mutant epididymal fluid or WT fluid in the presence of Ca<sup>2+</sup> (Figure 6A). We believe these observations together with the higher intracellular Ca<sup>2+</sup> levels detected in DKO sperm (Figure 6B) provides strong evidence supporting changes in Ca<sup>2+</sup> homeostasis in the epididymis and sperm as the main responsible for the observed sperm DNA integrity defects. This could be mediated by the activation of Ca<sup>2+</sup>-dependent nucleases present within the epididymal fluid and/or sperm cells as previously suggested (Shaman et al., 2006; PMID: 16914690, Sotolongo et al., 2005; PMID: 15713834, Boaz et al., 2008; PMID: 17879959, Dominguez and Ward, 2009; PMID: 19938954). These observations have now been included and discussed in the revised version of the manuscript (see lines 245-265 and 427-439).

      Minor Comments:

      (3) Standards for measuring rates should be clarified, such as two-cell rates are determined by dividing the number of two-cell embryos by the total number of eggs.

      As requested, standards for measuring rates have now been clarified in the corresponding figure legends

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors investigate the role of BEND2, a novel regulator of meiosis, in both male and female fertility. Huang et al have created a mouse model where the full-length BEND2 transcript is depleted but the truncated BEND2 version remains. This mouse model is fertile, and the authors used it to study the role of BEND2 on both male and female meiosis. Overall, the full-length BEND2 appears dispensable for male meiosis. The more interesting phenotype was observed in females. Females exhibit a lower ovarian reserve suggesting that full-length BEND2 is involved in the establishment of the primordial follicle pool. 

      Strengths: 

      The authors generated a mouse model that enabled them to study the role of BEND2 in meiosis. The role of BEND2 in female fertility is novel and enhances our knowledge of genes involved in the establishment of the primordial follicle pool. 

      Weaknesses: 

      The manuscript extensively explores the role of BEND2 in male meiosis; however, a more interesting result was obtained from the study of female mice. 

      We sincerely appreciate the reviewer’s thoughtful evaluation of our work and recognition of the strengths of our study. We are especially grateful for the acknowledgment of the novelty of our findings regarding the role of BEND2 in female fertility. While we extensively characterized the e ects of BEND2 depletion in male meiosis, we agree that the phenotype observed in females provides particularly interesting insights into the establishment of the primordial follicle pool. 

      Reviewer #2 (Public review): 

      In their manuscript entitled "BEND2 is a crucial player in oogenesis and reproductive aging", the authors present their findings that full-length BEND2 is important for repair of meiotic double strand break repair in spermatocytes, regulation of LINE-1 elements in spermatocytes, and proper oocyte meiosis and folliculogenesis in females. The manuscript utilizes an elegant system to specifically ablate the full-length form of BEND2 which has been historically di icult to study due to its location on the X chromosome and male sterility of global knockout animals. 

      The authors have been extremely responsive to reviewer critiques and have presented strong data and appropriate conclusions, making it an excellent addition to the field. 

      We are truly grateful for the reviewer’s thoughtful review and recognition of the key contributions of our study. We appreciate the acknowledgment of how our model overcomes the challenges in studying BEND2 and the importance of our findings in both male and female meiosis. We also value the reviewer’s encouraging comments on our responsiveness to their feedback and the quality of our data and conclusions.

      Reviewer #3 (Public review): 

      Huang et al. investigated the phenotype of Bend2 mutant mice which expressed truncated isoform. Bend2 deletion in male showed fertility and this enabled them to analyze the BEND2 function in females. They showed that Bend2 deletion in females showed decreasing follicle number which may lead to loss of ovarian reserve. 

      Strengths: 

      They found the truncated isoform of Bend2 and the depletion of this isoform showed decreasing follicle number at birth. 

      Weaknesses: 

      The authors showed novel factors that impact ovarian reserve. Although the number of follicles and conception rate are reduced in mutant mice, the in vitro fertilization rate is normal and follicles remain at 40 weeks of age. It is difficult to know how critical this is when applied to the human case. 

      We greatly appreciate the reviewer’s comments and recognition of the strengths of our work. We are grateful for their acknowledgment of our findings related to the truncated isoform of Bend2 and its e ect on ovarian reserve. We also agree that, although our study provides important insights, we are still far from directly applying these results to human clinical scenarios. There is much further research needed before these findings can be translated. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):: 

      The authors have addressed all concerns both editorially and experimentally. This is a very nice manuscript, and I congratulate the authors on their work. 

      We sincerely appreciate your kind words and thoughtful review. Your feedback has been invaluable in improving our manuscript, and we are grateful for your time and effort. Thank you for your support and encouragement!

      Reviewer #2 (Recommendations for the authors):: 

      In Figure 3, graphs in panels C & D have typos in the early zygotene column where it reads "zyotene". 

      We appreciate your careful review and for pointing out the typos in Figure 4, which has been corrected in the new version of the manuscript. 

      Reviewer #3 (Recommendations for the authors): 

      ・Since there are two isoforms of Bend2, and the authors depleted one isoform, this is not suitable to use "full length" in the titles and in the manuscripts. 

      We respectfully disagree with the reviewer’s comment. In our mouse model, we specifically remove the full-length isoform of Bend2. Therefore, we consider it appropriate to refer to it as such in the manuscript. Our results indicate that the full-length isoform is not required to complete meiotic prophase in males but is indispensable for setting up the ovarian reserve in females. We appreciate the reviewer’s input and are happy to clarify this point further if needed.

      ・Is there any reason why authors used 7 month old females for in vitro fertilization? It may not be recognized as aged mice but it seems a bit old to perform IVF especially when the ovarian reserve in mutant mice is decreased. If there is any reason, please clarify it. In addition, since the authors added IVF data, which showed similar fertilization ratio between control and mutant, the authors need to discuss why the litter size was decreased in mutant mice. It may be to strong to conclude "subfertility". 

      We used 7-month-old females for IVF because this falls within the age range of the samples analyzed for ovarian reserve, with the oldest females being 8 months old. Regarding the apparent discrepancy between IVF results and litter size, we addressed this in the discussion section of the manuscript: 'Interestingly, our mutant oocyte quality analysis suggests that mature oocytes from mutant females are equally competent to develop into a blastocyst as control ones. These data suggest that the subfertility observed in Bend2 mutants may be due to errors in later developmental stages, such as implantation or organogenesis.' We appreciate the reviewer’s feedback and hope this clarification helps.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Turi, Teng and the team used state-of-the-art techniques to provide convincing evidence on the infraslow oscillation of DG cells during NREM sleep, and how serotonergic innervation modulates hippocampal activity pattern during sleep and memory. First, they showed that the glutamatergic DG cells become activated following an infraslow rhythm during NREM sleep. In addition, the infraslow oscillation in the DG is correlated with rhythmic serotonin release during sleep. Finally, they found that specific knockdown of 5-HT receptors in the DG impairs the infraslow rhythm and memory, suggesting that serotonergic signaling is crucial for regulating DG activity during sleep. Given that the functional role of infraslow rhythm still remains to be studied, their findings deepen our understanding on the role of DG cells and serotonergic signaling in regulating infraslow rhythm, sleep microarchitecture and memory.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated DG neuronal activity at the population and single cell level across sleep/wake periods. They found an infraslow oscillation (0.01-0.03 Hz) in both granule cells (GC) and mossy cells (MC) during NREM sleep. The important findings are 1) the antiparallel temporal dynamics of DG neuron activities and serotonin neuron activities/extracellular serotonin levels during NREM sleep, and 2) the GC Htr1a-mediated GC infraslow oscillation.

      Strengths:

      (1) The combination of polysomnography, Ca-fiber photometry, two-photon microscopy and gene depletion is technically sound. The coincidence of microarousals and dips in DG population activity is convincing. The dip in activity in upregulated cells is responsible for the dip at the population level.

      (2) DG GCs express excitatory Htr4 and Htr7 in addition to inhibitory Htr1a, but deletion of Htr1a is sufficient to disrupt DG GC infraslow oscillation, supporting the importance of Htr1a in DG activity during NREM sleep.

      Weaknesses:

      (1) The current data set and analysis are insufficient to interpret the observation correctly.<br /> a. In Fig 1A, during NREM, the peaks and troughs of GC population activities seem to gradually decrease over time. Please address this point.

      b. In Fig 1F, about 30% of Ca dips coincided with MA (EMG increase) and 60% of Ca dips did not coincide with EMG increase. If this is true, the readers can find 8 Ca dips which are not associated with MAs from Fig 1E. If MAs were clustered, please describe this properly.<br /> c. In Fig 1F, the legend stated the percentage during NREM. If the authors want to include the percentage of wake and REM, please show the traces with Ca dips during wake and REM. This concern applies to all pie charts provided by the authors.

      d. In Fig 1C, please provide line plots connecting the same session. This request applies to all related figures.

      e. In Fig 2C, the significant increase during REM and the same level during NREM are not convincing. In Fig 2A, the several EMG increasing bouts do not appear to be MA, but rather wakefulness, because the duration of the EMG increase is greater than 15 seconds. Therefore, it is possible that the wake bouts were mixed with NREM bouts, leading to the decrease of Ca activity during NREM. In fact, In Fig 2E, the 4th MA bout seems to be the wake bout because the EMG increase lasts more than 15 seconds.

      f. Fig 5D REM data are interesting because the DRN activity is stably silenced during REM. The varied correlation means the varied DG activity during REM. The authors need to address it.

      g. In Fig 6, the authors should show the impact of DG Htr1a knockdown on sleep/wake structure including the frequency of MAs. I agree with the impact of Htr1a on DG ISO, but possible changes in sleep bout may induce the DG ISO disturbance.

      (2) It is acceptable that DG Htr1a KO induces the reduced freezing in the CFC test (Fig. 6E, F), but it is too much of a stretch that the disruption of DG ISO causes impaired fear memory. There should be a correlation.

      (3) It is necessary to describe the extent of AAV-Cre infection. The authors injected AAV into the dorsal DG (AP -1.9 mm), but the histology shows the ventral DG (Supplementary Fig. 4), which reduces the reliability of this study.

      Responses to weaknesses mentioned above have been addressed in the first revision.

      Comments on revisions:

      In the first revision, I pointed out the inappropriate analysis of the EEG/EMG/photometry data and gave examples. The authors responded only to the points raised and did not seem to see the need to improve the overall analysis and description. In this second revision, I would like to ask the authors to improve them. The biggest problem is that the detection criteria and the quantification of the specific event are not described at all in Methods and it is extremely difficult to follow the statement. All interpretations are made by the inappropriate data analysis; therefore, I have to say that the statement is not supported by the data.

      Please read my following concerns carefully and improve them.

      (1) The definition of the event is critical to the detection of the event and the subsequent analysis. In particular, the authors explicitly describe the definition of MA (microarousal), the trough and peak of the population level of intracellular Ca concentrations, or the onset of the decline and surge of Ca levels.

      (1-1) The authors categorized wake bouts of <15 seconds with high EMG activity as MA (in Methods). What degree of high EMG is relevant to MA and what is the lower limit of high EMG? In Fig 1E, there are some EMG spikes, but it was unclear which spike/wave (amplitude/duration) was detected as MA-relevant spike and which spike was not detected. In Fig 2E, the 3rd MA coincides with the EMG spike, but other EMG spikes have comparable amplitude to the 3rd MA-relevant EMG spike. Correct counting of MA events is critical in Fig 1F, 2F, 4C.

      We have added more information about the MA definition in Methods, including EMG amplitude. Furthermore, we have re-analyzed MA and MA-related calcium signals in Fig1 and Fig2. Fig-S1 shows the traces of EMG aptitude for all MA events show in Fig1G and Fig2G.

      (1-2) Please describe the definition of Ca trough in your experiments. In Fig 1G, the averaged trough time is clear (~2.5 s), so I can acknowledge that MA is followed by Ca trough. However, the authors state on page 4 that "30% of the calcium troughs during NREM sleep were followed by an MA epoch". This discrepancy should be corrected.

      We apologize for the misleading statement. We meant 30% of ISO events during NERM sleep. We have corrected this. To detect the calcium trough of ISO, we first calculated a moving baseline (blue line in Fig-S2 below) by smoothing the calcium signals over 60 s, then set a threshold (0.2 standard deviation from the moving baseline) for events of calcium decrease, and finally detected the minimum point (red dots in Fig-S2) in each event as the calcium trough. We have added these in Methods.

      (1-3) Relating comment 1-2, I agree that the latency is between MA and Ca through in page 4, as the authors explain in the methods, but, in Fig 1G, t (latency) is labeled at incorrect position. Please correct this.

      We are sorry for the mistake in describing the latency in the Methods. The latency was defined as the time difference between the onset of calcium decline (see details below in 1-4) and the onset of the MA. We have corrected this in the revised manuscript. Thus, the labeling in Fig1G was correct.

      (1-4) The authors may want to determine the onset of the decline in population Ca activity and the latency between onset and trough (Fig 1G, latency t). If so, please describe how the onset of the decline is determined. In Fig 1G, 2G, S6, I can find the horizontal dashed line and infer that the intersection of the horizontal line and the Ca curve is considered the onset. However, I have to say that the placement of this horizontal line is super arbitrary. The results (t and Drop) are highly dependent on the position of horizontal line, so the authors need to describe how to set the horizontal line.

      Indeed, we used the onset of calcium decline to calculate the latency as mentioned above. First, we defined the baseline (dashed line in Fig1G) by calculating the average of calcium signals in the10s window before the MA (from -15s to -5s in Fig1G). The onset of calcium decline is defined as the timepoint where calcium decrease was larger than 0.05 SD from this baseline. We have added these in Methods.

      (1-5) In order to follow Fig 1F correctly, the authors need to indicate the detection criteria of "Ca dip (in legend)". Please indicate "each Ca dip" in Fig 1E. As a reader, I would like to agree with the Ca dip detection of this Ca curve based on the criteria. Please also indicate "each Ca dip" in Fig 2E and 2F. In the case of the 2nd and 3rd MAs, do they follow a single Ca dip or does each MA follow each Ca dip? This chart is highly dependent on the detection criteria of Ca dip.

      We have indicated each ca dip in Fig 1 and Fig 2.

      As I mentioned above, most of the quantifications are not based on the clear detection criteria. The authors need to re-analyze the data and fix the quantification. Please interpret data and discuss the cellular mechanism of ISO based on the re-analyzed quantification.

      As suggested, we have re-analyzed the MA and MA-related photometry signals. Accordingly, parts of Fig1 and Fig2 have been revised. Although there are some small changes, the main results and conclusions remain unchanged.

      Reviewer #3 (Public review):

      Summary:

      The authors employ a series of well-conceived and well-executed experiments involving photometric imaging of the dentate gyrus and raphe nucleus, as well as cell-type specific genetic manipulations of serotonergic receptors that together serve to directly implicate serotonergic regulation of dentate gyrus (DG) granule (GC) and mossy cell (MC) activity in association with an infra slow oscillation (ISO) of neural activity has been previously linked to general cortical regulation during NREM sleep and microarousals.

      Strengths:

      There are a number of novel and important results, including the modulation of dentage granule cell activity by the infraslow oscillation during NREM sleep, the selective association of different subpopulations of granule cells to microarousals (MA), the anticorrelation of raphe activity with infraslow dentate activity.

      The discussion includes a general survey of ISOs and recent work relating to their expression in other brain areas and other potential neuromodulatory system involvement, as well as possible connections with infraslow oscillations, micro arousals, and sensory sensitivity.

      Weaknesses:

      - The behavioral results showing contextual memory impairment resulting from 5-HT1a knockdown are fine, but are over-interpreted. The term memory consolidation is used several times, as well as references to sleep-dependence. This is not what was tested. The receptor was knocked down, and then 2 weeks later animals were found to have fear conditioning deficits. They can certainly describe this result as indicating a connection between 5-HT1a receptor function and memory performance, but the connection to sleep and consolidation would just be speculation. The fact that 5-HT1a knockdown also impacted DG ISOs does not establish dependency. Some examples of this are:

      – The final conclusion asserts "Together, our study highlights the role of neuromodulation in organizing neuronal activity during sleep and sleep-dependent brain functions, such as memory.", but the reported memory effects (impairment of fear conditioning) were not shown to be explicitly sleep-dependent.

      – Earlier in the discussion it mentions "Finally, we showed that local genetic ablation of 5-HT1a receptors in GCs impaired the ISO and memory consolidation". The effect shown was on general memory performance - consolidation was not specifically implicated.

      – The assertion on page 9 that the results demonstrate "that the 5-HT is directly acting in the DG to gate the oscillations" is a bit strong given the magnitude of effect shown in Fig. 6D, and the absence of demonstration of negative effect on cortical areas that also show ISO activity and could impact DG activity (see requested cortical sigma power analysis).

      – Recent work has shown that abnormal DG GC activity can result from the use of the specific Ca indicator being used (GCaMP6s). (Teng, S., Wang, W., Wen, J.J.J. et al. Expression of GCaMP6s in the dentate gyrus induces tonic-clonic seizures. Sci Rep 14, 8104 (2024). https://doi.org/10.1038/s41598-024-58819-9). The authors of that study found that the effect seemed to be specific to GCaMP6s and that GCaMP6f did not lead to abnormal excitability. Note this is of particular concern given similar infraslow variation of cortical excitability in epilepsy (cf Vanhatalo et al. PNAS 2004). While I don't think that the experiments need to be repeated with a different indicator to address this concern, you should be able to use the 2p GCaMP7 experiments that have already been done to provide additional validation by repeating the analyses done for the GCaMP6s photometry experiments. This should be done anyway to allow appropriate comparison of the 2p and photometry results.

      – While the discussion mentions previous work that has linked ISOs during sleep with regulation of cortical oscillations in the sigma band, oddly no such analysis is performed in the current work even though it is presumably available and would be highly relevant to the interpretation of a number of primary results including the relationship between the ISOs and MAs observed in the DG and similar results reported in other areas, as well as the selective impact of DG 5-HT1a knockdown on DG ISOs. For example, in the initial results describing the cross correlation of calcium activity and EMG/EEG with MA episodes (paragraph 1, page 4), similar results relating brief arousals to the infraslow fluctuation in sleep spindles (sigma band) have been reported also at .02 Hz associated with variation in sensory arousability (cf. Cardis et al., "Cortico-autonomic local arousals and heightened somatosensory arousability during NREMS of mice in neuropathic pain", eLife 2021). It would be important to know whether the current results show similar cortical sigma band correlations. Also, in the results on ISO attenuation following 5-HT1 knockdown on page 7 (fig. 6), how is cortical EEG affected? is ISO still seen in EEG but attenuated in DG?

      – The illustrations of the effect of 5-HT1a knockdown shown in Figure 6 are somewhat misleading. The examples in panels B and C show an effect that is much more dramatic than the overall effect shown in panel D. Panels B and C do not appear to be representative examples. Which of the sample points in panel D are illustrated in panels B, C? it is not appropriate to arbitrarily select two points from different animals for comparison, or worse, to take points from the extremes of the distributions. If the intent is to illustrate what the effect shown in D looks like in the raw data, then you need to select examples that reflect the means shown in panel D. It is also important to show the effect on cortical EEG, particularly in sigma band to see if the effects are restricted to the DG ISOs. It would also be helpful to show that MAs and their correlations as shown in Fig 1 or G as well as broader sleep architecture are not affected.

      – On page 9 of the results it states that GCs and MCs are upregulated during NREM and their activity is abruptly terminated by MAs through a 5-HT mediated mechanism. I didn't see anything showing the 5-HT dependence of the MA activity correlation. The results indicate a reduction in ISO modulation of GC activity but not the MA correlated activity. I would like to see the equivalent of Fig 1,2 G panels with the 5-HT1a manipulation.

      Responses to Revewer#3 have been addressed in the first revision. 

      Reviewer #1 (Recommendations for the authors):

      Minor comment: Several recent publications from different laboratories have shown rhythmic release of norepinephrine (NE) (~0.03 Hz) in the medial prefrontal cortex, the thalamus, and in the locus coeruleus (LC) of the mouse during sleep-wake cycles-> Please add "preoptic area" here

      We have added the citation.

      Reviewer #2 (Recommendations for the authors):

      Minor

      (1) (abstract, page 2 line 9) what kind of "increased activity" did the authors find?

      Increased activity compared to that during wakefulness. We have added this.

      (2) (result, page 4) please define first, early, and late stage of NREM sleep in the methods.

      We have added these in the Methods.

      (3) (result, page 6) please define "the risetime of the phasic increase".

      It refers to the latency between the increase of 5-HT and the MA onset. We have clarified this in the text.

      (4) (supplement Fig 3 legend) please reword "5-HT events" and "5-HT signals" because these are ambiguous.

      We have defined the events in the legend.

      (5) (Fig 5A) please replace the picture without bubbles.

      We have replaced the image in Fig5A.

    1. Author response:

      Reviewer 1:

      A primary limitation of this study, acknowledged by the authors, is its reliance on self-reports of participants’ emotional states. Although considerable effort was made to minimize expectation effects, further research is needed to confirm that the observed behavioral changes reflect genuine alterations in emotional states.

      Thank you very much for raising this point. We fully agree that self-reported emotional states are inherently subjective and that the ramifications of this need to be clarified in the manuscript. However, we would suggest that the focus on self-report may be a strength rather than a limitation. First, the regularities and rules underlying and determining emotional self-report are of primary importance and interest in their own right, and the work presented here does, we believe, shed light on a rich structure present in multivariate timeseries of subjective self-reports and their response to external inputs. Second, there is no clear definition of what a ”genuine emotion state” might be; particularly if there is a discrepancy with self-reported emotions.

      Additionally, the generalizability of the findings to long-term remediation strategies remains an open question.

      Yes, we agree that what we have described is limited to a short-term intervention and change.

      Whether these changes bear on longer-term changes remains to be assessed. Furthermore, the mechanisms or processes that would support such a maintenance are of substantial interest, and will be the focus of future work.

      Second, the statistical analysis, particularly the computational approach, sometimes lacks sufficient detail and refinement. While I will not elaborate on specific points here, one notable issue is the interpretation of the intrinsic matrix (A). The model-free analysis reveals correlations between emotions at a given time or within an emotional state across time points. However, it does not provide evidence to support lagged interactions across states that would justify non-diagonal elements in A. The other result concerning the dynamics matrix only highlights a trend in the dominant eigenvalue, which is difficult to interpret in isolation. The absence of a statistically significant group x intervention interaction furthermore makes this finding a little compelling. This weakens the study’s conclusions about the importance of intrinsic dynamics, as claimed in the title.

      We appreciate the reviewer’s detailed feedback on the statistical analysis and interpretation of the intrinsic dynamics matrix. It is true that the model-free analysis as presented focuses on within-state correlations and that we have not provided such model-free evidence for lagged interactions across states. We do note that the model comparison suggested that the intervention caused changes in the full A matrix. This would be unlikely if there had not been meaningful cross-emotion lagged effects. Similarly, inference of the A matrix could have revealed a diagonal matrix, and we preferred not to impose such an assumption a priori, as it is very restrictive. Nevertheless, in the absence of a statistically significant group x intervention interaction, the findings regarding the A matrix are less compelling than those related to the control analyses. While this is likely due to a lack of statistical power, these are important points which we will consider in more detail in the revision.

      Finally, to avoid potential misunderstandings of their work, the authors should be more careful about their use of terms pertaining to the control theory and take the time to properly define them. For example, the ”controllability” of emotional states can either denote that those states are more changeable (control theory definition), or, conversely, more tightly regulated (common interpretation, as used in the abstract). This is true for numerous terms (stability, sensitivity, Gramian, etc.) for which no clear definition nor references are provided. Readers unfamiliar with the framework of control theory will likely be at a loss without more guidance.

      Thank you for this point. We recognize the potential for misunderstanding due to the dual usage of terms such as ”controllability” and will improve the clarity to avoid any misunderstanding.

      Reviewer 2:

      Acquiring data online inevitably gives rise to selection and self-selection effects. This needs to be acknowledged clearly. Exacerbating this, participant remuneration seems low at an amount below the minimum or living wage in Western countries (do the authors know where their participants came from?).

      Thank you for this point. We certainly agree that different experimental settings can induce different biases, and this is no different for online settings. However, online tasks such as the one used here, have become accepted, and there is now a substantial literature showing that in-lab effects are often well-replicated in online settings (Gillan and Rutledge, 2021) . For the current study, it is not clear that an inperson setting may not induce comparably complex biases, e.g. to do with differences between experimenters. All participants were from the UK. Remuneration rates were comparable to other experimental settings, in keeping with other online studies, UK living wage recommendations, and ultimately determined according to institutional ethical guidance.

      Another concern is that the intervention does not simply take place before the second block begins but is ongoing during the whole of the second block in that it is integrated into the phrasing of the task on each trial. It is therefore somewhat misleading to speak of a period ’after the intervention’, and it would have been interesting to assess the effect of this by including a third group where the phrasing does not change, but the floating leaves intervention takes place.

      Thank you for this point. We acknowledge that the phrasing of the emotion question in the second block may have influenced the observed effects. Including a third group without the reminder would have provided valuable insights and is an important consideration for future studies. We will acknowledge this limitation.

      As mentioned in the Limitations section, observation noise was assumed and not estimated. While this is understandable in this case, the effect of this assumption could have been assessed by simulation with varying levels of observation (and process) noise.

      Thank you for this comment. We would like to clarify that both observation noise and process noise were estimated in the analyses. We will ensure this is emphasized better in the revised version to avoid future misunderstandings.

      Relatedly, the reliance on formal model comparison is unfortunate since the outcome of such comparisons is easily influenced by slight changes to assumptions such as noise levels. An alternative approach would have been to develop a favoured model based on its suitability to address the research question and its ability, established by simulation, to distill relevant changes of behaviour into reliable parameter estimates.

      We agree that model comparison alone is insufficient. This is why we have also included extensive simulations, including posterior predictive checks, and have followed established best-practice procedures (Wilson and Collins, 2019). We have focused on a relatively simple model space to avoid overfitting to the dataset, and hence reduce the risk of spurious findings. While we agree that outcomes will be influenced by underlying assumptions, this would persist with the suggested approach of relying on a favoured model. Simulations themselves rely on predefined structures and noise specifications, which inherently shape parameter recovery and inference. Relying only on a favoured model might risk model misspecification, whereby the model may not actually capture the data, and the parameters intended to capture the intervention effect could be confounded. We will clarify the reasoning behind our approach in the revised version.

      The statistical analyses clearly show the limitations of classical statistical testing with highly complex models of the kind the authors (commendably) use. Hunting for statistically significant interactions in a multivariate repeated-measures design relying on inputs from time seriesderived point estimates is a difficult proposition. While the authors make the best of the bad situation they create by using null-hypothesis significance testing, a more promising approach would have been to estimate parameters using a sampler like Stan or PyMC and then draw conclusions based on posterior predictive simulations.

      This comment raises several interesting points. First, we agree that the value of classical test on individual parameters within such complex situations is limited. This is why our main focus is on global measures like model comparison. Our use of the classical tests is more to support the understanding of the nature of the data, i.e. they have a more descriptive aim. We will hope to clarify this further in the revision. Second, in terms of sampling, we would like to emphasize that the Kalman filter is both efficient and analytical tractable, making it well-suited to our data and research question. It may have been possible to use sampling to obtain posterior distributions rather than point estimates. However, we did not judge this to be worth the (substantial) additional computational cost.

      Reviewer 3:

      An interesting but perhaps at present slightly confusing aspect of their described results relates to the ’controllability’ of emotions, which they define as their susceptibility to external inputs. Readers should note this definition is (as I understand it) quite distinct from, and sometimes even orthogonal to, concepts of emotional control in the emotion literature, which refer to intentional control of emotions (by emotion regulation strategies such as distancing). The authors also use this second meaning in the discussion. Because of the centrality of control/controllability (in both meanings) to this paper, at present it is key for readers to bear these dual meanings in mind for juxtaposed results that distancing ”reduces controllability” while causing ”enhanced emotional control”.

      We fully agree with the reviewer’s observation that ”controllability” can be interpreted in different ways. we will revise the text to ensure consistent usage and explicitly state the distinction between the control theory definition of controllability and its interpretation in the emotion regulation literature.

      As above the authors use an active control - a relaxation intervention - which is extremely closely matched with their active intervention (and a major strength). However, there was an additional difference between the groups (as I currently understand it): ”in the group allocated to the distancing intervention, the phrasing of the question about their feelings in the second video block reminded participants about the intervention, stating: ”You observed your emotions and let them pass like the leaves floating by on the stream.” I do wonder if the effects of distancing also have been partially driven by some degree of reappraisal (considered a separate emotion regulation strategy) since this reminder might have evoked retrospective changes in ratings.

      We appreciate this substantial point. While our study was designed to isolate the effects of distancing, we acknowledge that elements of reappraisal may also have influenced the results. We will discuss this in the revised version. Additionally, as noted in our response to Reviewer 2, including a third group without the reminder could have provided valuable information, and we consider this to be an important direction for future research.

      Not necessarily a weakness, but an unanswered question is exactly how distancing is producing these effects. As the authors point out, there is a possibility that eye-movement avoidance of the more emotionally salient aspects of scenes could be changing participants’ exposure to the emotions somewhat. Not discussed by the authors, but possibly relevant, is the literature on differences between emotion types on oculomotor avoidance, which could have contributed to differential effects on different emotions.

      Thank you very much for these suggestions. It is very true that different emotions can elicit different patterns of oculomotor avoidance, which could have contributed to our observed effects. Research suggests that emotions such as disgust are associated with visual avoidance (Armstrong et al., 2014; Dalmaijer et al., 2021), whereas anxiety and other negative emotions exhibited increased attentional bias after fear conditioning (Kelly and Forsyth, 2009; Pischek-Simpson et al., 2009). It would be very interesting to repeat the experiment with eye-tracking to examine these possibilities. What would be particularly interesting to examine is whether a distancing intervention induces multiple, emotionally-specific behaviours, or not.

      References

      Armstrong, T., McClenahan, L., Kittle, J., and Olatunji, B. O. (2014). Don’t look now! Oculomotor avoidance as a conditioned disgust response. Emotion (Washington, D.C.), 14(1):95–104.

      Dalmaijer, E. S., Lee, A., Leiter, R., Brown, Z., and Armstrong, T. (2021). Forever yuck: Oculomotor avoidance of disgusting stimuli resists habituation. Journal of Experimental Psychology. General, 150(8):1598– 1611.

      Gillan, C. M. and Rutledge, R. B. (2021). Smartphones and the Neuroscience of Mental Health. Annual Review of Neuroscience, 44(Volume 44, 2021):129–151. Publisher: Annual Reviews.

      Kelly, M. M. and Forsyth, J. P. (2009). Associations between emotional avoidance, anxiety sensitivity, and reactions to an observational fear challenge procedure. Behaviour Research and Therapy, 47(4):331–338. Place: Netherlands Publisher: Elsevier Science.

      Pischek-Simpson, L. K., Boschen, M. J., Neumann, D. L., and Waters, A. M. (2009). The development of an attentional bias for angry faces following Pavlovian fear conditioning. Behaviour Research and Therapy, 47(4):322–330.

      Wilson, R. C. and Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8:e49547. Publisher: eLife Sciences Publications, Ltd.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Discussion: Could the authors discuss more the findings about Flavobacterium? Has it ever been associated with the urogenital tract?

      Page 13-14, line 252-268:

      ‘The genus Flavobacterium was defined in 1923 to encompass gram-negative, non-spore-forming rods, of yellow pigment (44). The inclusiveness of this definition resulted in a collective of heterogenous species. By 1984 the genus had been restricted to those that were also non-motile and non-gliding (44). More recently, with an increase in genomic profiling, many species previously considered to be of genus Flavobacterium have been reclassified to genus Chryseobacterium, Cytophaga, and Weeksella (45). Increasing numbers of Flavobacterium species are being discovered such as gondwanense, Collinsii, branchiarum, branchiicola, salegens and scophthalmum (46) (47) (48). The allocation of Flavobacterium aquatile to this genus remains controversial due to its motility (49). Flavobacterium species are widely distributed in the environment including soil, fresh water and saltwater habitats (50) (51).  There are many reports of pathogenic infections of Flavobacterium species in fish, however human infections are rare (48).  A handful of case reports have described opportunistic infections to include pneumonia, urinary tract infection, peritonitis and meningitis (52) (53) (54) (55). Flavobacterium lindanitolerans and Flavobacterium ceti have been isolated as causative agents in some (56) (54). Case reports also describe Flavobacterium odoratum as a causative agent in urinary tract infection, most often in the immunocompromised or those with indwelling devices (57) (58) (59). However, this was one of many species previously of genus Flavobacterium reclassified, in this case to genus Myroides (60). Notably in our sample participants were asymptomatic of urinary tract infection’. 

      What is the relative abundance of Flavobacterium in the present study: this type of bacterium has been previously associated with contaminations (PMID: 25387460, 30497919).

      Page 13, line 244-247:

      ‘The Flavobacterium genus taxon we identified as significantly associated with abnormal semen quality and sperm morphology was present in 36.28% of the samples, with a mean relative abundance of 1.15% in those samples. This information and the mention of previous findings of Flavibacterium in contamination studies have been added to the discussion’.

      Figure 1: Increase the size of panel A.

      Amended.

      Figure 3: Can the authors indicate the relative abundance of each genus/species by the size of the node?

      Co-occurrence network figure has been modified to display relative abundance of nodes.

      Supplementary data: I don't see anywhere the decontam plots.

      Decontam plots as suggested in the package vignette https://benjjneb.github.io/decontam/vignettes/decontam_intro.html have been added in the GitHub repository. For practical purposes, the plot corresponding to the frequency testing only display a random subset (n=15) of the total taxa (n=82) flagged by this test as contaminants. The. .csv files with the outputs of each filter are available in the same directory

      Line 12: Check the sentence

      Line 15: Genera in italics

      Line 33: Change "overall quality of the spermatozoa" to "overall semen quality"

      Lines 18-20: Rephrase

      Line 87: 28F-Borrelia

      Line 134: "Seminal microbiota" or "Composition of the seminal microbiota"

      Line 159: "These included ... genera"

      Line 166: "Of note, Flavobacterium genus was..."

      Lines 187-188: Check sentence

      Thank you, these have been amended

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1:

      The biggest concern in this regard is: that almost all the characterization is performed in cultured dissociated neurons…

      While it is true that most of the characterization done in this paper was in cultured neurons, we verified that PFE3 mediates functional ablation of excitatory synapses in vivo (Fig. 3). Furthermore, the GPHN.FingR-XIAP (GFE3), a protein very similar to the complex formed following activation of paGFE3 and chGFE3, has been extensively tested by us and others in vivo(1-4).

      Reviewer #2:

      For paGFE3 and chGFE3, the E3 ligase (RING domain of Mdm2) is overexpressed throughout cells as a separate construct. Although the authors show that Gephyrin is not significantly reduced without light or chemical activation, it remains possible that other proteins could be ubiquitinated due to the overexpressed E3 domain.

      In our previous paper(1), we tested neurons under 3 conditions: 1. expressing a construct similar to PBP-E3, consisting of a FingR with a randomized binding domain fused to the same XIAP ring domain used in paGFE3 and chGFE3 (RAND-E3). 2. expressing GPHN.FingR. 3. not expressing any exogenous proteins (control neurons). In each case, we found that expression of a variety of excitatory and inhibitory synaptic proteins was not significantly different when exposed to either of these exogenous proteins compared with control neurons.

      Recommendations for the authors:

      (1)  Can the authors use the tools to show the ablation of endogenous PSD95 without FingR overexpression?

      The experiments described in Fig. 3 are an example of this type of experiment. Furthermore, the PSD-95.FingR was extensively tested and has been used in dozens of studies without any indication that its expression alters cellular function or morphology. Note also that the transcriptional regulation system of PSD-95.FingR limits the expression such that there is virtually no background, so it is not really being overexpressed.

      (2) I am missing some control experiments for the excitatory synapses ablator- can the authors show that cells transfected with the plasmid and no DOX, show similar numbers of synapses as neurons without transfection?

      We have added an experiment comparing cells expressing PSD-95.FingR alone, and others expressing PFE3 with no Dox. We found that the two types of cells express amounts of PSD-95 that are not significantly different (Fig. S2L).

      (3) I am not quite sure how they used paired statistics on staining since they could only stain the cell at the end of the experiment. Are the comparisons performed on different cells?

      These experiments were done on the same cells. However, the methods of labeling were different- the initial counting of synapses was done, so we agree with the reviewer that it would be best not to use a paired analysis. Accordingly, we have changed Figs. 1F and 2D.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their efforts. They have pointed out several shortcomings and made very helpful suggestions. Based on their feedback, we have substantially revised the manuscript and feel the paper has been much improved because of it.

      Notable changes are:

      (1) As our model does not contain feed-back connections, the focus of the study is now more clearly communicated to be on feed-forward processes only, with appropriate justifications for this choice added to the Introduction and Discussion sections. Accordingly, the title has been changed to include the term “feed-forward”.

      (2) The old Figure 5 has been removed in favor of reporting correlation scores to the right of the response profiles in other figures.

      (3) We now discuss changes to the network architecture (new Figure 5) and fine-tuning of the hyperparameters (new Figure 6) in the main text instead of only the Supplementary Information.

      (4) The discussion on qualitative versus quantitative analysis has been extended and given its own subsection entitled “On the importance of experimental contrasts and qualitative analysis of the model”.

      Below, we address each point that the reviewers brought up in detail and outline what improvements we have made in the revision to address them.

      Reviewer #1 (Public Review):

      Summary:

      This study trained a CNN for visual word classification and supported a model that can explain key functional effects of the evoked MEG response during visual word recognition, providing an explicit computational account from detection and segmentation of letter shapes to final word-form identification.

      Strengths:

      This paper not only bridges an important gap in modeling visual word recognition, by establishing a direct link between computational processes and key findings in experimental neuroimaging studies, but also provides some conditions to enhance biological realism.

      Weaknesses:

      The interpretation of CNN results, especially the number of layers in the final model and its relationship with the processing of visual words in the human brain, needs to be further strengthened.

      We have experimented with the number of layers and the number of units in each layer. In the previous version of the manuscript, these results could be found in the supplementary information. For the revised version, we have brought some of these results into the main text and discuss them more thoroughly.

      We have added a figure (Figure 5 in the revised manuscript) showing the impact of the number of convolution and fully-connected layers on the response profiles of the layers, as well as the correlation with the three MEG components.

      We discuss the figure in the Results section as follows:

      “Various variations in model architecture and training procedure were evaluated. We found that the number of layers had a large impact on the response patterns produced by the model (Figure 5). The original VGG-11 architecture defines 5 convolution layers and 3 fully connected layers (including the output layer). Removing a convolution layer (Figure 5, top row), or removing one of the fully connected layers (Figure 5, second row), resulted in a model that did exhibit an enlarged response to noisy stimuli in the early layers that mimics the Type-I response. However, such models failed to show a sufficiently diminished response to noisy stimuli in the later layers, hence failing to produce responses that mimic the Type-II or N400m, a failure which also showed as low correlation scores.

      Adding an additional convolution layer (Figure 5, third row) resulted in a model where none of the layer response profiles mimics that of the Type-II response. The Type-II response is characterized by a reduced response to both noise and symbols, but an equally large response to consonant strings, real and pseudo words. However, in the model with an additional convolution layer, the consonant strings evoked a reduced response already in the first fully connected layer, which is a feature of the N400m rather than the Type-II. These kind of subtleties in the response pattern, which are important for the qualitative analysis, generally did not show quantitatively in the correlation scores, as the fully connected layers in this model correlate as well with the Type-II response as models that did show a response pattern that mimics the Type-II.

      Adding an additional fully connected layer (Figure 5, fourth row) resulted in a model with similar response profiles and correlation with the MEG components as the original VGG-11 architecture (Figure 5, bottom row) The N400m-like response profile is now observed in the third fully connected layer rather than the output layer. However, the decrease in response to consonant strings versus real and pseudo words, which is typical of the N400m, is less distinct than in the original VGG-11 architecture.”

      And in the Discussion section:

      “In the model, convolution units are followed by pooling units, which serve the purpose of stratifying the response across changes in position, size and rotation within the receptive field of the pooling unit. Hence, the effect of small differences in letter shape, such as the usage of different fonts, was only present in the early convolution layers, in line with findings in the EEG literature (Chauncey et al., 2008; Grainger & Holcomb, 2009; Hauk & Pulvermüller, 2004). However, the ability of pooling units to stratify such differences depends on the size of their receptive field, which is determined by the number of convolution-and-pooling layers. As a consequence, the response profiles of the subsequent fully connected layers was also very sensitive to the number of convolution-and-pooling layers. The optimal number of such layers is likely dependent on the input size and pooling strategy. Given the VGG-11 design of doubling the receptive field after each layer, combined with an input size of 225×225 pixels, the optimal number of convolution-andpooling layers for our model was five, or the model would struggle to produce response profiles mimicking those of the Type-II component in the subsequent fully connected layers (Figure 5).”

      Reviewer #1 (Recommendations For The Authors):

      (1) The similarity between CNNs and human MEG responses, including type-I (100ms), type-II (150ms), and N400 (400ms) components, looks like separately, lacking the sequential properties among these three components. Is the recurrent neural network (RNN), which can be trained to process and convert a sequential data input into a specific sequential data output, a better choice?

      When modeling sequential effects, meaning that the processing of the current word is influenced by the word that came before it, such as priming and top-down modulations, we agree that such a model would indeed require recurrency in its architecture. However, we feel that the focus of modeling efforts in reading has been overwhelmingly on the N400 and such priming effects, usually skipping over the pixel-to-letter process. So, for this paper, we were keen on exploring more basic effects such as noise and symbols versus letters on the type-I and type-II responses. And for these effects, a feed-forward model turns out to be sufficient, so we can keep the focus of this particular paper on bottom-up processes during single word reading, on which there is already a lot to say.

      To clarify our focus on feed-forward process, we have modified the title of the paper to be:

      “Convolutional networks can model the functional modulation of the MEG responses associated with feed-forward processes during visual word recognition” furthermore, we have revised the Introduction to highlight this choice, noting:

      “Another limitation is that these models have primarily focused on feed-back lexicosemantic effects while oversimplifying the initial feed-forward processing of the visual input.

      […]

      For this study, we chose to focus on modeling the early feed-forward processing occurring during visual word recognition, as the experimental setup in Vartiainen et al. (2011) was designed to demonstrate.

      […]

      By doing so, we restrict ourselves to an investigation of how well the three evoked components can be explained by a feed-forward CNN in an experimental setting designed to demonstrate feed-forward effects. As such, the goal is not to present a complete model of all aspects of reading, which should include feed-back effects, but rather to demonstrate the effectiveness of using a model that has a realistic form of input when the aim is to align the model with the evoked responses observed during visual word recognition.”

      And in the Discussion section:

      “In this paper we have restricted our simulations to feed-forward processes. Now, the way is open to incorporate convolution-and-pooling principles in models of reading that simulate feed-back processes as well, which should allow the model to capture more nuance in the Type-II and N400m components, as well as extend the simulation to encompass a realistic semantic representation.”

      (2) There is no clear relationship between the layers that signal needs to traverse in the model and the relative duration of the three components in the brain.

      While some models offer a tentative mapping between layers and locations in the brain, none of the models we are aware of actually simulate time accurately and our model is no exception.

      While we provide some evidence that the three MEG components are best modeled with different types of layers, and the type-I becomes somewhere before type-II and N400m is last in our model, the lack of timing information is a weakness of our model we have not been able to address. In our previous version, this already was the main topic of our “Limitations of the model” section, but since this weakness was pointed out by all reviewers, we have decided to widen our discussion of it:

      “One important limitation of the current model is the lack of an explicit mapping from the units inside its layers to specific locations in the brain at specific times. The temporal ordering of the components is simulated correctly, with the response profile matching that of the type-I occurring the layers before those matching the type-II, followed by the N400m. Furthermore, every component is best modeled by a different type of layer, with the type-I best described by convolution-and-pooling, the type-II by fully-connected linear layers and the N400m by a one-hot encoded layer. However, there is no clear relationship between the number of layers the signal needs to traverse in the model to the processing time in the brain. Even if one considers that the operations performed by the initial two convolution layers happen in the retina rather than the brain, the signal needs to propagate through three more convolution layers to reach the point where it matches the type-II component at 140-200 ms, but only through one more additional layer to reach the point where it starts to match the N400m component at 300-500 ms. Still, cutting down on the number of times convolution is performed in the model seems to make it unable to achieve the desired suppression of noise (Figure 5). It also raises the question what the brain is doing during the time between the type-II and N400m component that seems to take so long. It is possible that the timings of the MEG components are not indicative solely of when the feed-forward signal first reaches a certain location, but are rather dictated by the resolution of feed-forward and feedback signals (Nour Eddine et al., 2024).”

      See also our response to the next comment of the Reviewer, in which we dive more into the effect of the number of layers, which could be seen as a manipulation of time.

      (3) I am impressed by the CNN that authors modified to match the human brain pattern for the visual word recognition process, by the increase and decrease of the number of layers. The result of this part was a little different from the author’s expectation; however, the author didn’t explain or address this issue.

      We are glad to hear that the reviewer found these results interesting. Accordingly, we now discuss these results more thoroughly in the main text.

      We have moved the figure from the supplementary information to the main text (Figure 5 in the revised manuscript). And describe the results in the Results section:

      “Various variations in model architecture and training procedure were evaluated. We found that the number of layers had a large impact on the response patterns produced by the model (Figure 5). The original VGG-11 architecture defines 5 convolution layers and 3 fully connected layers (including the output layer). Removing a convolution layer (Figure 5, top row), or removing one of the fully connected layers (Figure 5, second row), resulted in a model that did exhibit an enlarged response to noisy stimuli in the early layers that mimics the Type-I response. However, such models failed to show a sufficiently diminished response to noisy stimuli in the later layers, hence failing to produce responses that mimic the Type-II or N400m, a failure which also showed as low correlation scores.

      Adding an additional convolution layer (Figure 5, third row) resulted in a model where none of the layer response profiles mimics that of the Type-II response. The Type-II response is characterized by a reduced response to both noise and symbols, but an equally large response to consonant strings, real and pseudo words. However, in the model with an additional convolution layer, the consonant strings evoked a reduced response already in the first fully connected layer, which is a feature of the N400m rather than the Type-II. These kind of subtleties in the response pattern, which are important for the qualitative analysis, generally did not show quantitatively in the correlation scores, as the fully connected layers in this model correlate as well with the Type-II response as models that did show a response pattern that mimics the Type-II.

      Adding an additional fully connected layer (Figure 5, fourth row) resulted in a model with similar response profiles and correlation with the MEG components as the original VGG-11 architecture (Figure 5, bottom row) The N400m-like response profile is now observed in the third fully connected layer rather than the output layer. However, the decrease in response to consonant strings versus real and pseudo words, which is typical of the N400m, is less distinct than in the original VGG-11 architecture.”

      We also incorporated these results in the Discussion:

      “However, the ability of pooling units to stratify such differences depends on the size of their receptive field, which is determined by the number of convolution-andpooling layers. This might also explain why, in later layers, we observed a decreased response to stimuli where text was rendered with a font size exceeding the receptive field of the pooling units (Figure 8). Hence, the response profiles of the subsequent fully connected layers was very sensitive to the number of convolution-and-pooling layers. This number is probably dependent on the input size and pooling strategy. Given the VGG11 design of doubling the receptive field after each layer, combined with an input size of 225x225 pixels, the optimal number of convolution-and-pooling layers for our model was five, or the model would struggle to produce response profiles mimicking those of the type-II component in the subsequent fully connected layers (Figure 5).

      […]

      A minimum of two fully connected layers was needed to achieve this in our case, and adding more fully connected layers would make them behave more like the component (Figure 5).”

      (4) Can the author explain why the number of layers in the final model is optimal by benchmarking the brain hierarchy?

      We have incorporated the figure describing the correlation between each model and the MEG components (previously Figure 5) with the figures describing the response profiles (Figures 4 and 5 in the revised manuscript and Supplementary Figures 2-6). This way, we (and the reader) can now benchmark every model qualitatively and quantitatively.

      As we stated in our response to the previous comment, we have added a more thorough discussion on the number of layers, which includes the justification for our choice for the final model. The benchmark we used was primarily whether the model shows the same response patterns as the Type I, Type II and N400 responses, which disqualifies all models with fewer than 5 convolution and 3 fully connected layers. Models with more layers also show the proper response patterns, however we see that there is actually very little difference in the correlation scores between different models. Hence, our justification for sticking with the original VGG11 architecture is that it produces the qualitative best response profiles, while having roughly the same (decently high) correlation with the MEG components. Furthermore, by sticking to the standard architecture, we make it slightly easier to replicate our results as one can use readily available pre-trained ImageNet weights.

      As well as always discussing the correlation scores in tandem with the qualitative analysis, we have added the following statement to the Results:

      “Based on our qualitative and quantitative analysis, the model variant that performed best overall was the model that had the original VGG11 architecture and was preinitialized from earlier training on ImageNet, as depicted in the bottom rows of Figure 4 and Figure 5.”

      Reviewer #2 (Public Review):

      As has been shown over many decades, many potential computational algorithms, with varied model architectures, can perform the task of text recognition from an image. However, there is no evidence presented here that this particular algorithm has comparable performance to human behavior (i.e. similar accuracy with a comparable pattern of mistakes). This is a fundamental prerequisite before attempting to meaningfully correlate these layer activations to human neural activations. Therefore, it is unlikely that correlating these derived layer weights to neural activity provides meaningful novel insights into neural computation beyond what is seen using traditional experimental methods.

      We very much agree with the reviewer that a qualitative analysis of whether the model can explain experimental effects needs to happen before a quantitative analysis, such as evaluating model-brain correlation scores. In fact, this is one of the intended key points we wished to make.

      As we discuss at length in the Introduction, “traditional” models of reading (those that do not rely on deep learning) are not able to recognize a word regardless of exact letter shape, size, and (up to a point) rotation. In this study, our focus is on these low-level visual tasks rather than high-level tasks concerning semantics. As the Reviewer correctly states, there are many potential computational algorithms able to perform these visual task at a human level and so we need to evaluate the model not only on its ability to mimic human accuracy but also on generating a comparable pattern of mistakes. In our case, we need a pattern of behavior that is indicative of the visual processes at the beginning of the reading pipeline. Hence, rather than relying on behavioral responses that are produced at the very end, we chose the evaluate the model based on three MEG components that provide “snapshots” of the reading process at various stages. These components are known to manifest a distinct pattern of “behavior” in the way they respond to different experimental conditions (Figure 2), akin to what to Reviewer refers to as a “pattern of mistakes”. The model was first evaluated on its ability to replicate the behavior of the MEG components in a qualitative manner (Figure 4). Only then do we move on to a quantitative correlation analysis. In this manner, we feel we are in agreement with the approach advocated by the Reviewer.

      In the Introduction, we now clarify:

      “Another limitation is that these models have primarily focused on feed-back lexicosemantic effects while oversimplifying the initial feed-forward processing of the visual input.

      […]

      We sought to construct a model that is able to recognize words regardless of length, size, typeface and rotation, as well as humans can, so essentially perfectly, whilst producing activity that mimics the type-I, type-II, and N400m components which serve as snapshots of this process unfolding in the brain.

      […]

      These variations were first evaluated on their ability to replicate the experimental effects in that study, namely that the type-I response is larger for noise embedded words than all other stimuli, the type-II response is larger for all letter strings than symbols, and that the N400m is larger for real and pseudowords than consonant strings. Once a variation was found that could reproduce these effects satisfactorily, it was further evaluated based on the correlation between the amount of activation of the units in the model and MEG response amplitude.”

      To make this prerequisite more clear, we have removed what was previously Figure 5, which showed the correlation between the various models the MEG components out of the context of their response patterns. Instead, these correlation values are now always presented next to the response patterns (Figures 4 and 5, and Supplementary Figures 2-6 in the revised manuscript). This invites the reader to always consider these metrics in relation to one another.

      One example of a substantial discrepancy between this model and neural activations is that, while incorporating frequency weighting into the training data is shown to slightly increase neural correlation with the model, Figure 7 shows that no layer of the model appears directly sensitive to word frequency. This is in stark contrast to the strong neural sensitivity to word frequency seen in EEG (e.g. Dambacher et al 2006 Brain Research), fMRI (e.g. Kronbichler et al 2004 NeuroImage), MEG (e.g. Huizeling et al 2021 Neurobio. Lang.), and intracranial (e.g. Woolnough et al 2022 J. Neurosci.) recordings. Figure 7 also demonstrates that the late stages of the model show a strong negative correlation with font size, whereas later stages of neural visual word processing are typically insensitive to differences in visual features, instead showing sensitivity to lexical factors.

      We are glad the reviewer brought up the topic of frequency balancing, as it is a good example of the importance of the qualitative analysis. Frequency balancing during training only had a moderate impact on correlation scores and from that point of view does not seem impactful. However, when we look at the qualitative evaluation, we see that with a large vocabulary, a model without frequency balancing fails to properly distinguish between consonant strings and (pseudo)words (Figure 4, 5th row). Hence, from the point of view of being able to reproduce experimental effects, frequency balancing had a large impact. We now discuss this more explicitly in the revised Discussion section:

      “Overall, we found that a qualitative evaluation of the response profiles was more helpful than correlation scores. Often, a deficit in the response profile of a layer that would cause a decrease in correlation on one condition would be masked by an increased correlation in another condition. A notable example is the necessity for frequency-balancing the training data when building models with a vocabulary of 10 000. Going by correlation score alone, there does not seem to be much difference between the model trained with and without frequency balancing (Figure 4A, fifth row versus bottom row). However, without frequency balancing, we found that the model did not show a response profile where consonant strings were distinguished from words and pseudowords (Figure 4A, fifth row), which is an important behavioral trait that sets the N400m component apart from the Type-II component (Figure 2D). This underlines the importance of the qualitative evaluation in this study, which was only possible because of a straightforward link between the activity simulated within a model to measurements obtained from the brain, combined with the presence of clear experimental conditions.”

      It is true that the model, even with frequency balancing, only captures letter- and bigramfrequency effects and not the word-frequency effects that we know the N400m is sensitive to. Since our model is restricted to feed-forward processes, this finding adds to the evidence that frequency-modulated effects are driven by feed-back effects as modeled by Nour Eddine et al. (2024, doi:10.1016/j.cognition.2024.105755). See also our response to the next comment by the Reviewer where we discuss feed-back connections. We have added the following to the section about model limitations in the revised Discussion:

      “The fact that the model failed to simulate the effects of word-frequency on the N400m (Figure 8), even after frequency-balancing of the training data, is additional evidence that this effect may be driven by feed-back activity, as for example modeled by Nour Eddine et al. (2024).”

      Like the Reviewer, we initially thought that later stages of neural visual word processing would be insensitive to differences in font size. When diving into the literature to find support for this claim, we found only a few works directly studying the effect of font size on evoked responses, but, surprisingly, what we did find seemed to align with our model. We have added the following to our revised Discussion:

      “The fully connected linear layers in the model show a negative correlation with font size. While the N400 has been shown to be unaffected by font size during repetition priming (Chauncey et al., 2008), it has been shown that in the absence of priming, larger font sizes decrease the evoked activity in the 300–500 ms window (Bayer et al., 2012; Schindler et al., 2018). Those studies refer to the activity within this time window, which seems to encompass the N400, as early posterior negativity (EPN). What possibly happens in the model is that an increase in font size causes an initial stronger activation in the first layers, due to more convolution units receiving input. This leads to a better signal-to-noise ratio (SNR) later on, as the noise added to the activation of the units remains constant whilst the amplitude of the input signal increases. A better SNR translates ultimately in less co-activation of units corresponding to orthographic neighbours in the final layers, hence to a decrease in overall layer activity.”

      Another example of the mismatch between this model and the visual cortex is the lack of feedback connections in the model. Within the visual cortex, there are extensive feedback connections, with later processing stages providing recursive feedback to earlier stages. This is especially evident in reading, where feedback from lexical-level processes feeds back to letter-level processes (e.g. Heilbron et al 2020 Nature Comms.). This feedback is especially relevant for the reading of words in noisy conditions, as tested in the current manuscript, as lexical knowledge enhances letter representation in the visual cortex (the word superiority effect). This results in neural activity in multiple cortical areas varying over time, changing selectivity within a region at different measured time points (e.g. Woolnough et al 2021 Nature Human Behav.), which in the current study is simplified down to three discrete time windows, each attributed to different spatial locations.

      We agree with the Reviewer that a full model of reading in the brain must include feed-back connections and share their sentiment that these feed-back processes play an important role and are a fascinating topic to study. The intent for the model presented in our study is very much to be a stepping stone towards extending the capabilities of models that do include such connections.

      However, there is a problem of scale that cannot be ignored.

      Current models of reading that do include feedback connections fall into the category we refer to in the paper as “traditional models” and all only a few layers deep and operate on very simplified inputs, such as pre-defined line segments, a few pixels, or even a list of prerecognized letters. The Heilbron et al. 2020 study that the Reviewer refers to is a good example of such a model. (This excellent and relevant work was somehow overlooked in our literature discussion in the Introduction. We thank the Reviewer for pointing it out to us.) Models incorporating realistic feed-back activity need these simplifications, because they have a tendency to no longer converge when there are too many layers and units. However, in order for models of reading to be able to simulate cognitive behavior such as resolving variations in font size or typeface, or distinguish text from non-text, they need to operate on something close to the pixel-level data, which means they need many layers and units.

      Hence, as a stepping stone, it is reasonable to evaluate a model that has the necessary scale, but lacks the feed-back connections that would be problematic at this scale, to see what it can and cannot do in terms of explaining experimental effects in neuroimaging studies. This was the intended scope of our study. For the revision, we have attempted to make this more clear.

      We have changed the title to be:

      “Convolutional networks can model the functional modulation of the MEG responses associated with feed-forward processes during visual word recognition” and added the following to the Introduction:

      “The simulated environments in these models are extremely simplified, partly due to computational limitations and partly due to the complex interaction of feed-forward and feed-back connectivity that causes problems with convergence when the model grows too large. Consequently, these models have primarily focused on feed-back lexico-semantic effects while oversimplifying the initial feed-forward processing of the visual input. 

      […]

      This rather high level of visual representation sidesteps having to deal with issues such as visual noise, letters with different scales, rotations and fonts, segmentation of the individual letters, and so on. More importantly, it makes it impossible to create the visual noise and symbol string conditions used in the MEG study to modulate the type-I and type-II components. In order to model the process of visual word recognition to the extent where one may reproduce neuroimaging studies such as Vartiainen et al. (2011), we need to start with a model of vision that is able to directly operate on the pixels of a stimulus. We sought to construct a model that is able to recognize words regardless of length, size, typeface and rotation with very high accuracy, whilst producing activity that mimics the type-I, type-II, and N400m components which serve as snapshots of this process unfolding in the brain. For this model, we chose to focus on the early feed-forward processing occurring during visual word recognition, as the experimental setup in the MEG study was designed to demonstrate, rather than feed-back effects

      […]

      By doing so, we restrict ourselves to an investigation of how well the three evoked components can be explained by a feed-forward CNN in an experimental setting designed to demonstrate feed-forward effects. > As such, the goal is not to present a complete model of all aspects of reading, which should include feed-back effects, but rather to demonstrate the effectiveness of using a model that has a realistic form of input when the aim is to align the model with the evoked responses observed during visual word recognition.”

      And we have added the following to the Discussion section:

      “In this paper we have restricted our simulations to feed-forward processes. Now, the way is open to incorporate convolution-and-pooling principles in models of reading that simulate feed-back processes as well, which should allow the model to capture more nuance in the Type-II and N400m components, as well as extend the simulation to encompass a realistic semantic representation. A promising way forward may be to use a network architecture like CORNet (Kubilius et al., 2019), that performs convolution multiple times in a recurrent fashion, yet simultaneously propagates activity forward after each pass. The introduction of recursion into the model will furthermore align it better with traditional-style models, since it can cause a model to exhibit attractor behavior (McLeod et al., 2000), which will be especially important when extending the model into the semantic domain.

      Furthermore, convolution-and-pooling has recently been explored in the domain of predictive coding models (Ororbia & Mali, 2023), a type of model that seems particularly well suited to model feed-back processes during reading (Gagl et al., 2020; Heilbron et al., 2020; Nour Eddine et al., 2024).”

      We also would like to point out to the Reviewer that we did in fact perform a correlation between the model and the MNE-dSPM source estimate of all cortical locations and timepoints (Figure 7B). Such a brain-wide correlation map confirms that the three dipole groups are excellent summaries of when and where interesting effects occur within this dataset.

      The presented model needs substantial further development to be able to replicate, both behaviorally and neurally, many of the well-characterized phenomena seen in human behavior and neural recordings that are fundamental hallmarks of human visual word processing. Until that point, it is unclear what novel contributions can be gleaned from correlating low-dimensional model weights from these computational models with human neural data.

      We hope that our revisions have clarified the goals and scope of this study. The CNN model we present in this study is a small but, we feel, essential piece in a bigger effort to employ deep learning techniques to further enhance already existing models of reading. In our revision, we have extended our discussion where to go from here and outline our vision on how these techniques could help us better model the phenomena the reviewer speaks of. We agree with the reviewer that there is a long way to go, and we are excited to be a part of it.

      In addition to the changes described above, we now end the Discussion section as follows: 

      “Despite its limitations, our model is an important milestone for computational models of reading that leverages deep learning techniques to encompass the entire computational process starting from raw pixels values to representations of wordforms in the mental lexicon. The overall goal is to work towards models that can reproduce the dynamics observed in brain activity observed during the large number of neuroimaging experiments performed with human volunteers that have been performed over the last few decades. To achieve this, models need to be able to operate on more realistic inputs than a collection of predefined lines or letter banks (for example: Coltheart et al., 2001; Heilbron et al., 2020; Laszlo & Armstrong, 2014; McClelland & Rumelhart, 1981; Nour Eddine et al., 2024). We have shown that even without feed-back connections, a CNN can simulate the behavior of three important MEG evoked components across a range of experimental conditions, but only if unit activations are noisy and the frequency of occurrence of words in the training dataset mimics their frequency of use in actual language.”

      Reviewer #3 (Public Review):

      The paper is rather qualitative in nature. In particular, the authors show that some resemblance exists between the behavior of some layers and some parts of the brain, but it is hard to quantitively understand how strong the resemblances are in each layer, and the exact impact of experimental settings such as the frequency balancing (which seems to only have a very moderate effect according to Figure 5).

      The large focus on a qualitative evaluation of the model is intentional. The ability of the model to reproduce experimental effects (Figure 4) is a pre-requisite for any subsequent quantitative metrics (such as correlation) to be valid. The introduction of frequency balancing is a good example of this. As the reviewer points out, frequency balancing during training has only a moderate impact on correlation scores and from that point of view does not seem impactful. However, when we look at the qualitative evaluation, we see that with a large vocabulary, a model without frequency balancing fails to properly distinguish between consonant strings and (pseudo)words (Figure 4, 5th row). Hence, from the point of view of being able to reproduce experimental effects, frequency balancing has a large impact.

      That said, the reviewer is right to highlight the value of quantitative analysis. An important limitation of the “traditional” models of reading that do not employ deep learning is that they operate in unrealistically simplified environments (e.g. input as predefined line segments, words of a fixed length), which makes a quantitative comparison with brain data problematic. The main benefit that deep learning brings may very well be the increase in scale that makes more direct comparisons with brain data possible. In our revision we attempt to capitalize on this benefit more. The reviewer has provided some helpful suggestions for doing so in their recommendations, which we discuss in detail below.

      We have added the following discussion on the topic of qualitative versus quantitative analysis to the Introduction:

      “We sought to construct a model that is able to recognize words regardless of length, size, typeface and rotation, as well as humans can, so essentially perfectly, whilst producing activity that mimics the type-I, type-II, and N400m components which serve as snapshots of this process unfolding in the brain.

      […]

      These variations were first evaluated on their ability to replicate the experimental effects in that study, namely that the type-I response is larger for noise embedded words than all other stimuli, the type-II response is larger for all letter strings than symbols, and that the N400m is larger for real and pseudowords than consonant strings. Once a variation was found that could reproduce these effects satisfactorily, it was further evaluated based on the correlation between the amount of activation of the units in the model and MEG response amplitude.”

      And follow this up in the Discussion with a new sub-section entitled “On the importance of experimental contrasts and qualitative analysis of the model”

      The experiments only consider a rather outdated vision model (VGG).

      VGG was designed to use a minimal number of operations (convolution-and-pooling, fullyconnected linear steps, ReLU activations, and batch normalization) and rely mostly on scale to solve the classification task. This makes VGG a good place to start our explorations and see how far a basic CNN can take us in terms of explaining experimental MEG effects in visual word recognition. However, we agree with the reviewer that it is easy to envision more advanced models that could potentially explain more. In our revision, we expand on the question of where to go from here and outline our vision on what types of models would be worth investigating and how one may go about doing that in a way that provides insights beyond higher correlation values.

      We have included the following in our Discussion sub-sections on “Limitations of the current model and the path forward”:

      “The VGG-11 architecture was originally designed to achieve high image classification accuracy on the ImageNet challenge (Simonyan & Zisserman, 2015). Although we have introduced some modifications that make the model more biologically plausible, the final model is still incomplete in many ways as a complete model of brain function during reading.

      […]

      In this paper we have restricted our simulations to feed-forward processes. Now, the way is open to incorporate convolution-and-pooling principles in models of reading that simulate feed-back processes as well, which should allow the model to capture more nuance in the Type-II and N400m components, as well as extend the simulation to encompass a realistic semantic representation. A promising way forward may be to use a network architecture like CORNet (Kubilius et al., 2019), that performs convolution multiple times in a recurrent fashion, yet simultaneously propagates activity forward after each pass. The introduction of recursion into the model will furthermore align it better with traditional-style models, since it can cause a model to exhibit attractor behavior (McLeod et al., 2000), which will be especially important when extending the model into the semantic domain. Furthermore, convolution-and-pooling has recently been explored in the domain of predictive coding models (Ororbia & Mali, 2023), a type of model that seems particularly well suited to model feed-back processes during reading (Gagl et al., 2020; Heilbron et al., 2020; Nour Eddine et al., 2024).”

      Reviewer #3 (Recommendations For The Authors):

      (1) The method used to select the experimental conditions under which the behavior of the CNN is the most brain-like is rather qualitative (Figure 4). It would have been nice to have a plot where the noisyness of the activations, the vocab size and the amount of frequency balancing are varied continuously, and show how these three parameters impact the correlation of the model layers with the MEG responses.

      We now include this analysis (Figure 6 in the revised manuscript, Supplementary Figures 47) and discuss these factors in the revised Results section:

      “Various other aspects of the model architecture were evaluated which ultimately did not lead to any improvements of the model. The response profiles can be found in the supplementary information (Supplementary Figures 4–7) and the correlations between the models and the MEG components are presented in Figure 6. The vocabulary of the final model (10 000) exceeds the number of units in its fullyconnected layers, which means that a bottleneck is created in which a sub-lexical representation is formed. The number of units in the fully-connected layers, i.e. the width of the bottleneck, has some effect on the correlation between model and brain (Figure 6A), and the amount of noise added to the unit activations less so (Figure 6B). We already saw that the size of the vocabulary, i.e. the number of wordforms in the training data and number of units in the output layer of the model, had a large effect on the response profiles (Figure 4). Having a large vocabulary is of course desirable from a functional point of view, but also modestly improves correlation between model and brain (Figure 6C). For large vocabularies, we found it beneficial to apply frequency-balancing of the training data, meaning that the number of times a word-form appears in the training data is scaled according to its frequency in a large text corpus. However, this cannot be a one-to-one scaling, since the most frequent words occur so much more often than other words that the training data would consist of mostly the top-ten most common words, with less common words only occurring once or not at all. Therefore, we decided to scale not by the frequency 𝑓 directly, but by 𝑓𝑠, where 0 < 𝑠 < 1, opting for 𝑠 = 0.2 for the final model (Figure 6D).”

      (2) It is not clear which layers exactly correspond to which of the three response components. For this to be clearer, it would have been nice to have a plot with all the layers of VGG on the x-axis and three curves corresponding to the correlation of each layer with each of the three response components.

      This is a great suggestion that we were happy to incorporate in the revised version of the manuscript. Every figure comparing the response patterns of the model and brain now includes a panel depicting the correlation between each layer of the model and each of the three MEG components (Figures 4 & 5, Supplementary Figures 2-5). This has given us (and now also the reader) the ability to better benchmark the different models quantitatively, adding to our discussion on qualitative to quantitative analysis.

      (3) It is not clear to me why the authors report the correlation of all layers with the MEG responses in Figure 5: why not only report the correlation of the final layers for N400, and that of the first layers for type-I?

      We agree with the reviewer that it would have been better to compare the correlation scores for those layers which response profile matches the MEG component. While the old Figure 5 has been merged with Figure 4, and now provides the correlations between all the layers and all MEG components, we have taken the Reviewer’s advice and marked the layers which qualitatively best correspond to each MEG component, so the reader can take that into account when interpreting the correlation scores.

      (4) The authors mention that the reason that they did not reproduce the protocol with more advanced vision models is that they needed the minimal setup capable of yielding the desired experiment effect. I am not fully convinced by this and think the paper could be significantly strengthened by reporting results for a vision transformer, in particular to study the role of attention layers which are expected to play an important role in processing higher-level features.

      We appreciate and share the Reviewer’s enthusiasm in seeing how other model architectures would fare when it comes to modeling MEG components. However, we regard modifying the core model architecture (i.e., a series of convolution-and-pooling followed by fully-connected layers) to be out of scope for the current paper.

      One of the key points of our study is to create a model that reproduces the experimental effects of an existing MEG study, which necessitates modeling the initial feed-forward processing from pixel to word-form. For this purpose, a convolution-and-pooling model was the obvious choice, because these operations play a big role in cognitive models of vision in general. In order to properly capture all experimental contrasts in the MEG study, many variations of the CNN were trained and evaluated. This iterative design process concluded when all experimental contrasts could be faithfully reproduced.

      If we were to explore different model architectures, such as a transformer architecture, reproducing the experimental contrasts of the MEG study would no longer be the end goal, and it would be unclear what the end goal should be. Maximizing correlation scores has no end, and there are a nearly endless number of model architectures one could try. We could bring in a second MEG study with experimental contrasts that the CNN cannot explain and a transformer architecture potentially could and set the end goal to explain all experimental effects in both MEG studies. But even if we had access to such a dataset, this would almost double the length of the paper, which is already too long.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Insects and their relatives are commonly infected with microbes that are transmitted from mothers to their offspring. A number of these microbes have independently evolved the ability to kill the sons of infected females very early in their development; this male killing strategy has evolved because males are transmission dead-ends for the microbe. A major question in the field has been to identify the genes that cause male killing and to understand how they work. This has been especially challenging because most male-killing microbes cannot be genetically manipulated. This study focuses on a male-killing bacterium called Wolbachia. Different Wolbachia strains kill male embryos in beetles, flies, moths, and other arthropods. This is remarkable because how sex is determined differs widely in these hosts. Two Wolbachia genes have been previously implicated in male-killing by Wolbachia: oscar (in moth male-killing) and wmk (in fly male-killing). The genomes of some male-killing Wolbachia contain both of these genes, so it is a challenge to disentangle the two.

      This paper provides strong evidence that oscar is responsible for male-killing in moths. Here, the authors study a strain of Wolbachia that kills males in a pest of tea, Homona magnanima. Overexpressing oscar, but not wmk, kills male moth embryos. This is because oscar interferes with masculinizer, the master gene that controls sex determination in moths and butterflies. Interfering with the masculinizer gene in this way leads the (male) embryo down a path of female development, which causes problems in regulating the expression of genes that are found on the sex chromosomes.

      We would like to thank you for evaluating our manuscript.

      Strengths:

      The authors use a broad number of approaches to implicate oscar, and to dissect its mechanism of male lethality. These approaches include: a) overexpressing oscar (and wmk) by injecting RNA into moth eggs, b) determining the sex of embryos by staining female sex chromosomes, c) determining the consequences of oscar expression by assaying sex-specific splice variants of doublesex, a key sex determination gene, and by quantifying gene expression and dosage of sex chromosomes, using RNASeq, and d) expressing oscar along with masculinizer from various moth and butterfly species, in a silkmoth cell line. This extends recently published studies implicating oscar in male-killing by Wolbachia in Ostrinia corn borer moths, although the Homona and Ostrinia oscar proteins are quite divergent. Combined with other studies, there is now broad support for oscar as the male-killing gene in moths and butterflies (i.e. order Lepidoptera). So an outstanding question is to understand the role of wmk. Is it the master male-killing gene in insects other than Lepidoptera and if so, how does it operate?

      We would like to thank you for evaluating our manuscript. Our data demonstrated that Oscar homologs play important roles in male-killing phenotypes in moths and butterflies; however, the functional relevance of wmk remains uncertain. As you noted, whether wmk acts as a male-killing gene in insects such as flies and beetles—or even in certain lepidopteran species—requires further investigation using diverse insect models, which we are eager to explore in future research.

      Weaknesses:

      I found the transfection assays of oscar and masculinizer in the silkworm cell line (Figure 4) to be difficult to follow. There are also places in the text where more explanation would be helpful for non-experts.

      Thank you for your suggestion. We have revised the section on the cell-based experiment. Further, we revised the manuscript to make it accessible to a broader audience. We believe these revisions have significantly improved the clarity and comprehensiveness of our manuscript.

      Reviewer #2 (Public review):

      Summary:

      Wolbachia are maternally transmitted bacteria that can manipulate host reproduction in various ways. Some Wolbachia induce male killing (MK), where the sons of infected mothers are killed during development. Several MK-associated genes have been identified in Homona magnanima, including Hm-oscar and wmk-1-4, but the mechanistic links between these Wolbachia genes and MK in the native host are still unclear.

      In this manuscript, Arai et al. show that Hm-oscar is the gene responsible for Wolbachia-induced MK in Homona magnanima. They provide evidence that Hm-Oscar functions through interactions with the sex determination system. They also found that Hm-Oscar disrupts sex determination in male embryos by inducing female-type dsx splicing and impairing dosage compensation. Additionally, Hm-Oscar suppresses the function of Masc. The manuscript is well-written and presents intriguing findings. The results support their conclusions regarding the diversity and commonality of MK mechanisms, contributing to our understanding of the mechanisms and evolutionary aspects of Wolbachia-induced MK.

      We would like to thank you for evaluating our manuscript.

      Comments on revisions:

      The authors have already addressed the reviewer's concerns.

      We would like to thank you for evaluating our manuscript.

      Reviewer #3 (Public review):

      Summary:

      Overall, this is a clearly written manuscript with nice hypothesis testing in a non-model organism that addresses the mechanism of Wolbachia-mediated male killing. The authors aim to determine how five previously identified male-killing genes (encoded in the prophage region of the wHm Wolbachia strain) impact the native host, Homona magnanima moths. This work builds on the authors' previous studies in which

      (1) they tested the impact of these same wHm genes via heterologous expression in Drosophila melanogaster

      (2) also examined the activity of other male-killing genes (e.g., from the wFur Wolbachia strain in its native host: Ostrinia furnacalis moths).

      Advances here include identifying which wHm gene most strongly recapitulates the male-killing phenotype in the native host (rather than in Drosophila), and the finding that the Hm-Oscar protein has the potential for male-killing in a diverse set of lepidopterans, as inferred by the cell-culture assays.

      We would like to thank you for evaluating our manuscript.

      Strengths:

      Strengths of the manuscript include the reverse genetics approaches to dissect the impact of specific male-killing loci, and use of a "masculinization" assay in Lepidopteran cell lines to determine the impact of interactions between specific masc and oscar homologs.

      We would like to thank you for evaluating our manuscript.

      Weaknesses:

      It is clear from Figure 1 that the combinations of wmk homologs do not cause male killing on their own here. While I largely agree with the author's conclusions that oscar is the primary MK factor in this system, I don't think we can yet rule out that wmk(s) may work synergistically or interactively with oscar in vivo. This might be worth a small note in the discussion. (eg at line 294 'indicating that wmk likely targets factors other than masc." - this could be downstream of the impacts of oscar; perhaps dependent on oscar-mediated impacts on masc first).

      We sincerely appreciate your suggestion. Whilst wmk genes themselves did not exhibit apparent lethal effects on the native host, as you noted, we cannot entirely rule out the possibility that wmk may be involved in male-killing actions, either directly or indirectly assisting the function of Hb-oscar. Following your suggestion, we have added a brief note in the discussion section regarding the interpretation of wmk functions.

      “In addition, Katsuma et al. (2022) reported that the wmk homologs encoded by wFur did not affect the masculinizing function of masc in vitro, indicating that wmk likely targets factors other than masc. Whilst we cannot rule out the possibility that wmk may work synergistically or interactively with oscar in vivo—potentially acting downstream of oscar’s impact—our results strongly suggested that Wolbachia strains have acquired multiple MK genes through evolution.” (lines 287-292)

      Regarding the perceived male-bias in Figure 2a: I think readers might be interpreting "unhatched" as "total before hatching". You could eliminate ambiguity by perhaps splitting the bars into male and female, and then within a bar, coloring by hatched versus unhatched. But this is a minor point, and I think the updated text helps clarify this.

      Thank you for your suggestion. We have accordingly revised the figure 2a. In addition, we have included more detailed information in the first sentence of the section Males are killed mainly at the embryonic stage.

      “The sex of hatched larvae (neonates) and the remaining unhatched embryos was determined by the presence or absence of W chromatin, a condensed structure of the female-specific W chromosome observed during interphase.” (lines 171-173)

      The new Figure 4b looks to be largely redundant with the oscar information in Figure 1a.

      Thank you for your suggestion. We have removed Figure 4b due to its overlap with Figure 1a and have incorporated relevant figure legends into the Figure 1a legend.

      Updated statistical comparisons for the RNA-seq analysis are helpful. However these analyses are based on single libraries (albeit each a pool of many individuals), so this is still a weaker aspect of the manuscript.

      Thank you for your suggestion. As you noted, the use of single libraries (due to the limited number of available individuals, though each includes approximately 50 males and females) may be a potential limitation of this study. However, as demonstrated in the qPCR assay for the Z-linked gene provided in the previous revision, we believe that our data and conclusion—that Wolbachia/ Hb-oscar disrupts dosage compensation by causing the overexpression of Z-linked genes—are well-supported and robust.

      The new information on masc similarity is useful (Fig 4d) - if the authors could please include a heatmap legend for the colors, that would be helpful. Also, please avoid green and red in the same figure when key for interpretation.

      Thank you for your suggestion. We have accordingly included a heatmap legend and revised the colors.

      Figure 1A "helix-turn-helix" is misspelled. ("tern").

      We have revised.

      Recommendations for the authors:

      Comments from the reviewing editor: I would suggest you address the comments of the reviewer on the revised version.

      We have further revised the manuscript to address all the questions, comments and suggestions provided by the reviewers. We believe that the resulting revisions have significantly enhanced the quality and comprehensiveness of our manuscript.

      Reviewer #1 (Recommendations for the authors):

      Thank you for revising this manuscript. I have a few last recommendations:

      - Line 214: re: 'Statistical data are available in the supplementary data file', it would be more helpful to add a few words here that actually summarize the statistical results

      We would like to thank you for your suggestion. We have revised the sentence to describe the overview of the statistical results.

      “RNA-seq analysis revealed that, in Hm-oscar-injected embryos, Z-linked genes (homologs on the B. mori chromosomes 1 and 15) were more expressed in males than in females (Fig. 3a), which was not observed in the GFP-injected group (Fig. 3b). Similarly, as previously reported by Arai et al. (2023a), high levels of Z-linked gene expression were also observed in wHm-t-infected males, but not in NSR males (Fig. 3c,d). The high (i.e., doubled) Z-linked gene expression in both Hm-oscar-expressed and wHm-t-infected males was further confirmed by quantification of the Z-linked Hmtpi gene (Fig. 3e). These trends were statistically supported, with all data available in the supplementary data file.” (lines 205-213)

      - Figure 1 legend: do you mean 'bridged' instead of 'brigged'?

      We have accordingly revise, thank you for the suggestion.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      (1) The results do not support the conclusions. The main "selling point" as summarized in the title is that the apoptotic rate of zebrafish motorneurons during development is strikingly low (~2% ) as compared to the much higher estimate (~50%) by previous studies in other systems. The results used to support the conclusion are that only a small percentage (under 2%) of apoptotic cells were found over a large population at a variety of stages 24-120hpf. This is fundamentally flawed logic, as a short-time window measure of percentage cannot represent the percentage on the long-term. For example, at any year under 1% of human population die, but over 100 years >99% of the starting group will have died. To find the real percentage of motorneurons that died, the motorneurons born at different times must be tracked over long term, or the new motorneuron birth rate must be estimated. Similar argument can be applied to the macrophage results.<br />

      In the revised manuscript (revised Figure 4), we extended the observation time window as long as possible, from 24 hpf to 240 hpf. After 240 hpf, the transparency of zebrafish body decreased dramatically, which made optical imaging quite difficult.

      We are confident that this 24-240 hpf time window covers the major time window during which motor neurons undergo programmed cell death during zebrafish early development. We chose the observation time window based on the following two reasons: 1) Previous studies showed that although the time windows of motor neuron death vary in chick (E5-E10), mouse (E11.5-E15.5), rat (E15-E18), and human (11-25 weeks of gestation), the common feature of these time windows is that they are all the developmental periods when motor neurons contact with muscle cells. The contact between zebrafish motor neurons and muscle cells occurs before 72 hpf, which is included in our observation time window. 2) Most organs of zebrafish form before 48-72 hpf, and they complete hatching during 48-72 hpf. Food-seeking and active avoidance behaviors also start at 72 hpf, indicating that motor neurons are fully functional at 72 hpf.

      Previous studies in zebrafish have shown that the production of spinal cord motor neurons largely ceases before 48 hpf, and then the motor neurons remain largely constant until adulthood (doi: 10.1016/j.celrep.2015.09.050; 10.1016/j.devcel.2013.04.012; 10.1007/BF00304606; 10.3389/fcell.2021.640414). Our observation time window covers the major motor neuron production process. Therefore, we believe that neurogenesis will not affect our findings and conclusions.

      Although we are confident that 240 h tracking is long enough to measure the motor neuron death rate, several sentences have been added in the discussion part, “In our manuscript, we tracked the motor neuron death in live zebrafish until 240 hpf, which was the longest time window we could achieve. But there was still a possibility that zebrafish motor neurons might die after 240 hpf.”

      We agreed that the “2%” description might not be very accurate. Thus, we have revised our title to “Zebrafish live imaging reveals a surprisingly small percentage of spinal cord motor neurons die during early development.”

      (2) The conclusion regarding timing of axon and cell body caspase activation and apoptosis timing also has clear issues. The ~minutes measurement are too long as compared to the transport/diffusion timescale between the cell body and the axon, caspase activity could have been activated in the cell body and either caspase or the cleaved sensor move to the axon in several seconds. The authors' results are not high frequency enough to resolve these dynamics. Many statements suggest oversight of literature, for example, in abstract "however, there is still no real-time observation showing this dying process in live animals.".

      Real-time imaging of live animals is quite challenging in the field. Currently, using confocal microscopy, we can only achieve minute-scale tracking. In the future, with more advanced imaging techniques, the sensor fish in the present study may provide us with more detailed information on motor neuron death. We have removed “real-time” from our revised manuscript. We also revised the mentioned sentence in the abstract.

      (3) Many statements should use more scholarly terms and descriptions from the spinal cord or motorneuron, neuromuscular development fields, such as line 87 "their axons converged into one bundle to extend into individual somite, which serves as a functional unit for the development and contraction of muscle cells"

      We have removed this sentence.

      (4) The transgenic line is perhaps the most meaningful contribution to the field as the work stands. However, mnx1 promoter is well known for its non-specific activation - while the images do suggest the authors' line is good, motorneuron markers should be used to validate the line. This is especially important for assessing this population later as mnx1 may be turned off in mature neurons. The author's response regarding mnx1 specificity does not mitigate the original concern.

      The mnx1 promoter has been widely used to label motor neurons in transgenic zebrafish. Previous studies have shown that most of the cells labeled in the mnx1 transgenic zebrafish are motor neurons. In this study, we observed that the neuronal cells in our sensor zebrafish formed green cell bodies inside of the spinal cord and extended to the muscle region, which is an important morphological feature of the motor neurons.

      Furthermore, a few of those green cell bodies turned into blue apoptotic bodies inside the spinal cord and changed to blue axons in the muscle regions at the same time, which strongly suggests that those apoptotic neurons are not interneurons.

      In fact, no matter what method is used, such as using antibodies to stain specific markers to label motor neurons, 100% specificity cannot be achieved. More importantly, although the mnx1 promoter might have labeled some interneurons, this will not affect our major finding that only a small percentage of spinal cord motor neurons die during the early development of zebrafish.

      Reviewer 2:

      (1) Title: The 50% figure of motor neurons dying through apoptosis during early vertebrate development is not precisely accurate. In papers referenced by the authors, there is a wide distribution of percentages of motor neurons that die depending on the species and the spinal cord region. In addition, the authors did not examine limb-innervating motor neurons, which are the ones best studied in motor neuron programmed cell death in other species. Thus, a better title that reflects what they actually show would be something like "A surprisingly small percentage of early developing zebrafish motor neurons die through apoptosis in non-limb innervating regions of the spinal cord."

      In fish, there are no such structures as limbs, although fins may be evolutionarily related to limbs. In our manuscript, we studied the naturally occurring motor neuron death in the whole spinal cord during the early stage of zebrafish development. The death of motor neurons in limb-innervating motor neurons has been extensively studied in chicks and rodents, as it is easy to undergo operations such as amputation. However, previous studies have shown this dramatic motor neuron death occurs not only in limb-innervating motor neurons but also in other spinal cord motor neurons (doi: 10.1006/dbio.1999.9413).

      We have revised our title to “Zebrafish live imaging reveals a surprisingly small percentage of spinal cord motor neurons die during early development.”

      (2) lines 18-19: "embryonic stage of vertebrates" is very broad, since zebrafish are also vertebrates; it would be better to be more specific

      lines 25-26: The authors should be more specific about which animals have widespread neuronal cell death.

      We have revised our manuscript accordingly.

      (3) lines 98-99; 110-111; 113; 122-123; 140-141: A cell can undergo apoptosis. But an axon, which is only part of a cell, cannot undergo apoptosis. Especially since the axon doesn't have a separate nucleus, and the definition of apoptosis usually includes nuclear fragmentation. A better subheading would describe the result, which is that caspase activation is seen in both the cell body and the axon.

      We have revised the subheadings and related words in the manuscript accordingly. In the introduction, we also revised the expression of the third aim from “Which part of a neuron (cell body vs. axon) will die first?” to “Which part of a neuron (cell body vs. axon) will degrade first?”.

      (4) lines 159-160; 178-179: This is an oversimplification of the literature. The authors should spell out which populations of motor neuron have been examined and say something about the similarities and difference in motor neuron death.

      We have revised it accordingly.

      (5) lines 200; 216: The authors did not observe macrophages engulfing motor neurons. But that does not mean that they cannot. Making the conclusion stated in this subheading would require some kind of experiment, not just observations.

      We did observe few colocalizations of macrophages and dead motor neurons.  To more accurately express these data, in the revised manuscript, we used “colocalization” to replace “engulfment.” The subheading has been revised to “Most dead motor neurons were not colocalized with macrophages.” Accordingly, panel C of Figure 5 has also been revised.

      (6) lines 234-246: The authors seem to have missed the point about VaP motor neuron death, which was two-fold. First, VaP death has been previously described, thus it could serve as a control for the work in this paper, especially since the conditions underlying VaP death and survival have been experimentally tested. Second, they should acknowledge that previous work showed that at least some motor neuron death in zebrafish differs from that described in chick and rodents. This conclusion came from work showing that death of VaP is independent of limitations in muscle innervation area, suggesting it is not coupled to muscle-derived neurotrophic factors.

      Figures: The authors should say which level of the spinal cord they examined in each figure.

      We have compared our findings with previous findings in the revised manuscript. The death of VaP motor neurons is not related to neurotrophic factors, but the death of other motor neurons may be related to neurotrophic factors, which needs further study and evidence. Our study examined the overall motor neuron apoptosis regardless of the causes and locations. To avoid misunderstanding, in the revised manuscript, we removed the data and words related to neurotrophic factors.

      We also extended the observation time window as long as possible, from 24 hpf to 240 hpf (revised Figure 4). After 240 hpf, the transparency of zebrafish body decreased dramatically, which made the optical imaging quite difficult.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Experiments in model organisms have revealed that the effects of genes on heritable traits are often mediated by environmental factors---so-called gene-by-environment (or GxE) interactions. In human genetics, however, where indirect statistical approaches must be taken to detect GxE, limited evidence has been found for pervasive GxE interactions. The present manuscript argues that the failure of statistical methods to detect GxE may be due to how GxE is modelled (or not modelled) by these methods.

      The authors show, via re-analysis of an existing dataset in Drosophila, that a polygenic ‘amplification’ model can parsimoniously explain patterns of differential genetic effects across environments. (Work from the same lab had previously shown that the amplification model is consistent with differential genetic effects across the sexes for several traits in humans.) The parsimony of the amplification model allows for powerful detection of GxE in scenarios in which it pertains, as the authors show via simulation.

      Before the authors consider polygenic models of GxE, however, they present a very clear analysis of a related question around GxE: When one wants to estimate the effect of an individual allele in a particular environment, when is it better to stratify one’s sample by environment (reducing sample size, and therefore increasing the variance of the estimator) versus using the entire sample (including individuals not in the environment of interest, and therefore biasing the estimator away from the true effect specific to the environment of interest)? Intuitively, the sample-size cost of stratification is worth paying if true allelic effects differ substantially between the environment of interest and other environments (i.e., GxE interactions are large), but not worth paying if effects are similar across environments. The authors quantify this trade-off in a way that is both mathematically precise and conveys the above intuition very clearly. They argue on its basis that, when allelic effects are small (as in highly polygenic traits), single-locus tests for GxE may be substantially underpowered.

      The paper is an important further demonstration of the plausibility of the amplification model of GxE, which, given its parsimony, holds substantial promise for the detection and characterization of GxE in genomic datasets. However, the empirical and simulation examples considered in the paper (and previous work from the same lab) are somewhat “best-case” scenarios for the amplification model, with only two environments, and with these environments amplifying equally the effects of only a single set of genes. It would be an important step forward to demonstrate the possibility of detecting amplification in more complex scenarios, with multiple environments each differentially modulating the effects of multiple sets of genes. This could be achieved via simulations similar to those presented in the current manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Wine et al. describe a framework to view the estimation of gene-context interaction analysis through the lens of bias-variance tradeoff. They show that, depending on trait variance and context-specific effect sizes, effect estimates may be estimated more accurately in context-combined analysis rather than in context-specific analysis. They proceed by investigating, primarily via simulations, implications for the study or utilization of gene-context interaction, for testing and prediction, in traits with polygenic architecture. First, the authors describe an assessment of the identification of context-specificity (or context differences) focusing on “top hits” from association analyses. Next, they describe an assessment of polygenic scores (PGSs) that account for context-specific effect sizes, showing, in simulations, that often the PGSs that do not attempt to estimate context-specific effect sizes have superior prediction performance. An exception is a PGS approach that utilizes information across contexts. Strengths:

      The bias-variance tradeoff framing of GxE is useful, interesting, and rigorous. The PGS analysis under pervasive amplification is also interesting and demonstrates the bias-variance tradeoff.

      Weaknesses:

      The weakness of this paper is that the first part -- the bias-variance tradeoff analysis -- is not tightly connected to, i.e. not sufficiently informing, the later parts, that focus on polygenic architecture. For example, the analysis of “top hits” focuses on the question of testing, rather than estimation, and testing was not discussed within the bias-variance tradeoff framework. Similarly, while the PGS analysis does demonstrate (well) the bias-variance tradeoff, the reader is left to wonder whether a bias-variance deviation rule (discussed in the first part of the manuscript) should or could be utilized for PGS construction.

      We thank the editors and the reviewers for their thoughtful critique and helpful suggestions throughout. In our revision, we focused on tightening the relationship between the analytical single variant bias-variance tradeoff derivation and the various empirical analyses that follow.

      We improved discussion of our scope and what is beyond our scope. For example, our language was insufficiently clear if it suggested to the editor and reviewers that we are developing a method to characterize polygenic GxE. Developing a new method that does so (let alone evaluating performance across various scenarios) is beyond the scope of this manuscript.

      Similarly, we clarify that we use amplification only as an example of a mode of GxE that is not adequately characterized by current approaches. We do not wish to argue it is an omnibus explanation for all GxE in complex traits. In many cases, a mixture of polygenic GxE relationships seems most fitting (as observed, for example, in Zhu et al., 2023, for GxSex in human physiology).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      MAJOR COMMENT

      The amplification model is based on an understanding of gene networks in which environmental variables concertedly alter the effects of clusters of genes, or modules, in the network (e.g., if an environmental variable alters the effect of some gene, it indirectly and proportionately alters the effects of genes downstream of that gene in the network---or upstream if the gene acts as a bottleneck in some pathway). It is clear in this model that (i) multiple environmental variables could amplify distinct modules, and (ii) a single environmental variable could itself amplify multiple separate modules, with a separate amplification factor for each module.

      However, perhaps inspired by their previous work on GxSex interactions in humans, the authors’ focus in the present manuscript is on cases where there are only two environments (“control” and “high-sugar diet” in the Drosophila dataset that they reanalyze, and “A” and “B” in their simulations [and single-locus mathematical analysis]), and they consider models where these environments amplify only a single set of genes, i.e., with a single amplification factor. While it is of course interesting that a single-amplification-factor model can generate data that resemble those in the Drosophila dataset that the authors re-analyze, most scenarios of amplification GxE will presumably be more complex. It seems that detecting amplification in these more complex scenarios using methods such as the authors do in their final section will be correspondingly more difficult. Indeed, in the limit of sufficiently many environmental variables amplifying sufficiently many modules, the scenario would resemble one of idiosyncratic single-locus GxE which, as the authors argue, is very difficult to detect. That more complex scenarios of amplification, with multiple environments separately amplifying multiple modules each, might be difficult to detect statistically is potentially an important limitation to the authors’ approach, and should be tested in their simulations.

      We agree that characterizing GxE when there is a mixture of drivers of context-dependency is difficult. Developing a method that does so across multiple (and perhaps not pre-defined) contexts is of high interest to us but beyond the scope of the current manuscript

      We note that for GxSex, modeling this mixture does generally improve phenotypic prediction, and more so in traits where we infer amplification as a major mode of GxE.

      MINOR COMMENTS

      Lines 88-90: “This estimation model is equivalent to a linear model with a term for the interaction between context and reference allele count, in the sense that context-specific allelic effect estimators have the same distributions in the two models.”

      Does this equivalence require the model with the interaction term also to have an interaction term for the intercept, i.e., the slope on a binary variable for context (since the generative model in Eq. 1 allows for context-specific intercepts)?

      It does require an interaction term for the intercept. This is e_i (and its effect beta_E) in Eq. S2 (line 70 of the supplement).

      Lines 94-96: Perhaps just a language thing, but in what sense does the estimation model described in lines 92-94 “assume” a particular distribution of trait values in the combined sample? It’s just an OLS regression, and one can analyze its expected coefficients with reference to the generative model in Eq. 1, or any other model. To say that it “assumes” something presupposes its purpose, which is not clear from its description in lines 92-94.

      We corrected “assume” to “posit”.

      Lines 115-116: It should perhaps be noted that the weights wA and wB need not sum to 1.

      Indeed; it is now explicitly stated.

      Lines 154-160: I think the role of r could be made even clearer by also discussing why, when VA>>VB, it is better to use the whole-sample estimate of betaA than the sample-A-specific estimate (since this is a more counterintuitive case than the case of VA<<VB discussed by the authors).

      This is addressed in lines 153-154, stating: “Typically, this (VA<<VB) will also imply that the additive estimator is greatly preferable for estimating β_B , as β_B will be extremely noisy”

      Line 243 and Figure 4 caption: The text states that the simulated effects in the high-sugar environment are 1.1x greater than those in the control environment, while the caption states that they are 1.4x greater.

      We have corrected the text to be consistent with our simulations.

      TYPOS/WORDING

      Line 14: “harder to interpret” --> “harder-to-interpret”

      Line 22: We --> we

      Line 40: “as average effect” -> “as the average effect”?

      Line 57: “context specific” --> “context-specific”

      Line 139: “re-parmaterization” --> “re-parameterization”

      Lines 140, 158, 412: “signal to noise” --> “signal-to-noise”

      Figure 3C,D: “pule rate” --> “pulse rate”

      The caption of Figure 3: “conutinous” --> “continuous”

      Line 227: “a variant may fall” --> “a variant may fall into”

      Line 295: “conferring to more GxE” --> “conferring more GxE” or “corresponding to more GxE”? This is very pedantic, but I think “bias-variance” should be “bias--variance” throughout, i.e., with an en-dash rather than a hyphen.

      We have corrected all of the above typos.

      Reviewer #2 (Recommendations For The Authors):

      (This section repeats some of what I wrote earlier).

      - First polygenic architecture part: the manuscript focuses on “top hits” in trying to identify sets of variants that are context-specific. This “top hits” approach seems somewhat esoteric and, as written, not connected tightly enough to the bias-variance tradeoff issue. The first section of the paper which focuses on bias-variance trade-off mostly deals with estimation. The “top hits” section deals with testing, which introduces additional issues that are due to thresholding. Perhaps the authors can think of ways to make the connection stronger between the bias-variance tradeoff part to the “top hits” part, e.g., by introducing testing earlier on and/or discussion estimation in addition to testing in the “top hits” part of the manuscript. The second polygenic architecture part: polygenic scores that account for interaction terms. Here the authors focused (well, also here) on pervasive amplification in simulations. This part combines estimation and testing (both the choice of variants and their estimated effects are important). In pervasive amplification the idea is that causal variants are shared, the results may be different than in a model with context-specific effects and variant selection may have a large impact. Still, I think that these simulations demonstrate the idea developed in the bias-variance tradeoff part of the paper, though the reader is left to wonder whether a bias-variance decision rule should or could be utilized for PGS construction.

      In both of these sections we discuss how the consideration of polygenic GxE patterns alters the conclusions based on the single-variant tradeoff. In the “top hits” section, we show that single-variant classification itself, based on a series of marginal hypothesis tests alone, can be misleading. The PGS prediction accuracy analysis shows that both approaches are beaten by the polygenic GxE estimation approach. Intuitively, this is because the consideration of polygenic GxE can mitigate both the bias and variance, as it leverages signals from many variants.

      We agree that the links between these sections of the paper were not sufficiently clear, and have added signposting to help clarify them (lines 176-180; lines 275-277; lines 316-321).

      - Simulation of GxDiet effects on longevity: the methods of the simulation are strange, or communicated unclearly. The authors’ report (page 17) poses a joint distribution of genetic effects (line 439), but then, they simulated effect estimates standard errors by sampling from summary statistics (line 445) rather than simulated data and then estimating effect and effect SE. Why pose a true underlying multivariate distribution if it isn’t used?

      We rewrote the Methods section “Simulation of GxDiet effects on longevity in Drosophila to make our simulation approach clearer (lines 427-449). We are indeed simulating the true effects from the joint distribution proposed. However, in order to mimic the noisiness of the experiment in our simulations, we sample estimated effects from the true simulated effects, with estimation noise conferring to that estimated in the Pallares et al. dataset (i.e., sampling estimation variances from the squares of empirical SEs).

      - How were the “most significantly associated variants” selected into the PGS in the polygenic prediction part? Based on a context-specific test? A combined-context test of effect size estimates?

      For the “Additive” and “Additive ascertainment, GxE estimation” models (red and orange in Fig. 5, respectively), we ascertain the combined-context set. For the “GxE” and “polygenic GxE” (green and blue in Fig. 5, respectively) models, we ascertain in a context-specific test. We now state this explicitly in lines 280-288 and lines 507-526.

      - As stated, I find the conclusion statement not specific enough in light of the rest of the manuscript. “the consideration of polygenic GxE trends is key” - this is very vague. What does it mean “to consider polygenic GxE trends” in the context of this paper? I can’t tell. “The notion that complex trait analyses should combine observations at top associated loci” - I don’t think the authors really refer to combining “observations”, rather perhaps combine information from top associated loci. But this does not represent the “top hits” approach that merely counts loci by their testing patterns. “It may be a similarly important missing piece...” What does “it” refer to? The top loci? What makes it an important missing piece?

      We rewrote the conclusion paragraph to address these concerns (lines 316-321).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      […] Overall, this is an important paper that demonstrates that one model for transgenerational inheritance in C. elegans is not reproducible. This is important because it is not clear how many of the reported models of transgenerational inheritance reported in C. elegans are reproducible. The authors do demonstrate a memory for F1 embryos that could be a maternal effect, and the authors confirm that this is mediated by a systemic small RNA response. There are several points in the manuscript where a more positive tone might be helpful.

      We would like to correct the statement made in the second to last sentence. The demonstration of an F1 response to PA14 was first reported by Moore et al., (2019) and then by Pereira et al., (2020) using a different behavioral assay. We merely confirmed these results in our hands, and confirmed the observation, first reported by Kaletsky et al., (2020), that sid-1 and sid-2 are required for this F1 response; although we did find that sid-1 and sid-2 are not required for the PA14-induced increase in daf-7p::gfp expression in ASI neurons in the F1 progeny of trained adults, which had not been addressed in the published work.

      Yes, the intergenerational F1 response could be a maternal effect, but the in utero F1 embryos and their precursor germ cells were directly exposed to PA14 metabolites and toxins (non-maternal effect) as well as any parental response, whether mediated by small RNAs, prions, hormones, or other unknown information carriers. While the F1 aversion response does require sid-1 and sid-2, we would not presume that the substrate is therefore an RNA molecule, particularly because the systemic RNAi response supported by sid-1 and sid-2 is via long double-stranded RNA. To date, no evidence suggests that either protein transports small RNAs, particularly single-stranded RNAs.

      Strengths:

      The authors note that the high copy number daf-7::GFP transgene used by the Murphy group displayed variable expression and evidence for somatic silencing or transgene breakdown in the Hunter lab, as confirmed by the Murphy group. The authors nicely use single copy daf-7::GFP to show that neuronal daf-7::GFP is elevated in F1 but not F2 progeny with regards to the memory of PA14 avoidance, speaking to an intergenerational phenotype.

      The authors nicely confirm that sid-1 and sid-2 are generally required for intergenerational avoidance of F1 embryos of moms exposed to PA14. However, these small RNA proteins did not affect daf-7::GFP elevation in the F1 progeny. This result is unexpected given previous reports that single copy daf-7::GFP is not elevated in F1 progeny of sid mutants. Because the Murphy group reported that daf-7 mutation abolishes avoidance for F1 progeny, this means that the sid genes function downstream of daf-7 or in parallel, rather than upstream as previously suggested.

      The published report (Moore et al., 2019) shows only multicopy daf-7p::gfp results and does not address the daf-7p::gfp response in sid-1 or sid-2 mutants. Thus, our discovery that systemic RNAi, exogenous RNAi, and heritable RNAi mutants don’t disrupt elevated daf-7p::gfp in ASI neurons in the F1 progeny of PA14 trained P0’s is only unexpected with respect to the published models (Moore et al., 2019, Kaletsky et al., 2020).

      The authors studied antisense small RNAs that change in Murphy data sets, identifying 116 mRNAs that might be regulated by sRNAs in response to PA14. Importantly, the authors show that the maco-1 gene, putatively targeted by piRNAs according to the Kaletsky 2020 paper, displays few siRNAs that change in response to PA14. The authors conclude that the P11 ncRNA of PA14, which was proposed to promote interkingdom RNA communication by the Murphy group, is unlikely to affect maco-1 expression by generating sRNAs that target maco-1 in C. elegans. The authors define 8 genes based on their analysis of sRNAs and mRNAs that might promote resistance to PA14, but they do not further characterize these genes' role in pathogen avoidance. The Murphy group might wish to consider following up on these genes and their possible relationship with P11.

      Weaknesses:

      This very thorough and interesting manuscript is at times pugnacious.

      We reiterate that we never claimed that Moore et al., (2019) did not obtain their reported results. We simply stated that we could not replicate their results using the published methods and then failed in our search to identify variable(s) that might account for our results. In revising the manuscript, we have striven to make clear, unmuddied statements of facts and state that future investigations may provide independent evidence that supports the original claims and explains our divergent results.

      Please explain more clearly what is High Growth media for E. coli in the text and methods, conveying why it was used by the Murphy lab, and if Normal Growth or High Growth is better for intergenerational heritability assays.

      We added the standard recipes and the following explanations in the methods section to the revised text.

      “NG plates minimally support OP50 growth, resulting in a thin lawn that facilitates visualization of larvae and embryos. HG plates (8X more peptone) support much higher OP50 growth, resulting in a thick bacterial lawn that supports larger worm populations.”

      We have also included the following text in our presentation and discussion of the effects of growth conditions on worm choice in PA14 vs OP50 choice assays.

      “Furthermore, because OP50 pathogenicity is enhanced by increased E. coli nutritive conditions (Garsin et al., 2003, Shi et al., 2006), the growth of F1-F4 progeny on High Growth (HG) plates (Moore et al., 2019; 2021b), which contain 8X more peptone than NG plates and therefore support much higher OP50 growth levels, immediately prior to the F1-F4 choice assays may further contribute to OP50 aversion among the control animals.”

      We don’t know enough to claim that HG or NG media is better than the other for intergenerational assays, but they are different. Thus, switching between the two in a multigenerational experiment likely introduces unknown variability.

      Reviewer #2 (Public Review):

      This paper examines the reproducibility of results reported by the Murphy lab regarding transgenerational inheritance of a learned avoidance behavior in C. elegans. It has been well established by multiple labs that worms can learn to avoid the pathogen pseudomonas aeruginosa (PA14) after a single exposure. The Murphy lab has reported that learned avoidance is transmittable to 4 generations and dependent on a small RNA expressed by PA14 that elicits the transgenerational silencing of a gene in C. elegans. The Hunter lab now reports that although they can reproduce inheritance of the learned behavior by the first generation (F1), they cannot reproduce inheritance in subsequent generations.

      This is an important study that will be useful for the community. Although they fail to identify a "smoking gun", the study examines several possible sources for the discrepancy, and their findings will be useful to others interested in using these assays. The preference assay appears to work in their hands in as much as they are able to detect the learned behavior in the P0 and F1 generations, suggesting that the failure to reproduce the transgenerational effect is not due to trivial mistakes in the protocol. An obvious reason, however, to account for the differing results is that the culture conditions used by the authors are not permissive for the expression of the small RNA by PA14 that the MUrphy lab identified as required for transgenerational inheritance. It would seem prudent for the authors to determine whether this small RNA is present in their cultures, or at least acknowledge this possibility.

      We thank the reviewer for raising this issue and have added the following statement to this effect in the revised manuscript.

      “We note that previous bacterial RNA sequence analysis identified a small non-coding RNA called P11 whose expression correlates with bacterial growth conditions that induce heritable avoidance (Kaletsky et al., 2020). Critically, C. elegans trained on a PA14 ΔP11 strain (which lacks this small RNA) still learn to avoid PA14, but their F1 and F2-F4 progeny fail to show an intergenerational or transgenerational response (Figure 3L in Kaletsky et al., 2020). The fact that we observed an intergenerational (F1) avoidance response is evidence that our PA14 growth conditions induce P11 expression.”

      We believe that this addresses the concern raised here.

      The authors should also note that their protocol was significantly different from the Murphy protocol (see comments below) and therefore it remains possible that protocol differences cumulatively account for the different results.

      As suggested below, we have added to the supplemental documents the protocol we followed for the aversion assay. In our view, this document shows that our adjustments to the core protocol were minor. Furthermore, where possible, these adjustments were explicitly tested in side-by-side experiments for both the aversion assay and the daf-7p::gfp expression assay and presented in the manuscript.

      To discover the source(s) of discrepancy between our results and the published results we subsequently introduced variations to this core protocol to exclude likely variables (worm and bacteria growth temperatures, assay conditions, worm handling methods, bacterial culture and storage conditions, and some minor developmental timing issues). Again, where possible, the effect of variations was tested in side-by-side experiments for both the aversion assay and the daf-7p::gfp expression assay and were presented in or have now been added to the manuscript.

      It remains possible that we misunderstood the published Murphy lab protocols, but we were highly motivated to replicate the results so we could use these assays to investigate the reported RNAi-pathway dependent steps, thus we read every published version with extreme care.

      Reviewer #3 (Public Review):

      […] Strengths:

      (1) The authors provide a thorough description of their methods, and a marked-up version of a published protocol that describes how they adapted the protocol to their lab conditions. It should be easy to replicate the experiments.

      As noted above in response to a suggestion by reviewer #2, we have replaced the annotated published protocol with the protocol that we followed. This will aid other groups' attempts to replicate our experimental conditions.

      (2) The authors test the source of bacteria, growth temperature (of both C. elegans and bacteria), and light/dark husbandry conditions. They also supply all their raw data, so that the sample size for each testing plate can be easily seen (in the supplementary data). None of these variations appears to have a measurable effect on pathogen avoidance in the F2 generation, with all but one of the experiments failing to exhibit learned pathogen avoidance.

      We note that the parallel analysis of daf-7p::gfp expression in ASI neurons was also tested for several of these conditions and also failed to replicate the published findings.

      (3) The small RNA seq and mRNA seq analysis is well performed and extends the results shown in the original paper. The original paper did not give many details of the small RNA analysis, which was an oversight. Although not a major focus of this paper, it is a worthwhile extension of the previous work.

      (4) It is rare that negative results such as these are accessible. Although the authors were unable to determine the reason that their results differ from those previously published, it is important to document these attempts in detail, as has been done here. Behavioral assays are notoriously difficult to perform and public discourse around these attempts may give clarity to the difficulties faced by a controversial field.

      Thank you for your support. Choosing to pursue publication of these negative results was not an easy decision, and we thank members of the community for their support and encouragement.

      Weaknesses:

      (1) Although the "standard" conditions have been tested over multiple biological replicates, many of the potential confounders that may have altered the results have been tested only once or twice. For example, changing the incubation temperature to 25{degree sign}C was tested in only two biological replicates (Exp 5.1 and 5.2) - and one of these experiments actually resulted in apparent pathogen avoidance inheritance in the F2 generation (but not in the F1). An alternative pathogen source was tested in only one biological replicate (Exp 3). Given the variability observed in the F2 generation, increasing biological replicates would have added to the strengths of the report.

      We agree that our study was not exhaustive in our exploration of variables that might be interfering with our ability to detect F2 avoidance. We also note that some of these variables also failed (with many more independent experiments) to induce elevated daf-7p::gfp expression in ASI neurons in F2 progeny. Our goal was not to show that variation in some growth or assay condition would generate reproducible negative results, but the exploration was designed to tweak conditions to enable detection of a robust F2 response. Given the strength of the data presented in Moore et al., (2019) we expected that adjustment of the problematic variable would produce positive results apparent in a single replicate, which could then be followed up. If we had succeeded, then we would have documented the conditions that enabled robust F2 inheritance and would have explored molecular mechanisms that support this important but mysterious process.

      (2) A key difference between the methods used here and those published previously, is an increase in the age of the animals used for training - from mostly L4 to mostly young adults. I was unable to find a clear example of an experiment when these two conditions were compared, although the authors state that it made no difference to their results.

      We can state firmly that the apparent time delay did not affect P0 learned avoidance (new Figure S1) or, as documented in Table S1, daf-7p::gfp expression in ASI neurons. In our experience, training mostly L4’s on PA14 frequently failed to produce sufficient F1 embryos for both F1 avoidance assays or daf-7p::gfp measurements in ASI neurons and collection of F2 progeny. Indeed, in early attempts to detect heritable PA14 aversion, trained P0 and F1 progeny were not assayed in order to obtain sufficient F2’s for a choice assay. These animals failed to display aversion, but without evidence of successful P0 training or an F1 intergenerational response this was deemed a non-fruitful trouble-shooting approach. We have added supplemental Figure S1 which presents P0 choice assay results from experiments using younger trained animals that failed to produce sufficient F1’s to continue the inheritance experiments.

      The different timing at the start of training between the two protocols may reflect the age of the recovered bleached P0 embryos. It is reasonable to assume that bleaching day 1 adults vs day 2 or 3 adults from the P-1 population could shift the average age of recovered P0 embryos by several hours. The Murphy protocol only states that P0 embryos were obtained by bleaching healthy adults. Regardless, if the hypothesis entertained here is true, that a several hour difference in larval/adult age during 24 hours of training affects F2 inheritance of learned aversion but does not affect P0 learned avoidance, then we would argue that this paradigm for heritable learned avoidance, as described in Moore et al., (2019, 2021), is not sufficiently robust for mechanistic investigations.

      (3) The original paper reports a transgenerational avoidance effect up to the F5 generation. Although in this work the authors failed to see avoidance in the F2 generation, it would have been prudent to extend their tests for more generations in at least a couple of their experiments to ensure that the F2 generation was not an aberration (although this reviewer acknowledges that this seems unlikely to be the case).

      We would point out that we also failed to robustly replicate the F2 response in the daf-7p::gfp expression assays. An F2-specific aberration that affects two different assays seems quite unlikely, and it remains unclear how we would interpret a positive result in F3 and F4 generations without a positive result in the F2 generation. Were we to further extend these investigations, we believe that exploration of additional culture conditions would warrant higher priority than extension of our results to the F3 and F4 generations.

      Reviewing Editor Comments:

      The reviewers' suggestions for improving the manuscript were mostly minor, to change the wording in some places and to add some more explanation regarding the methods.

      What should be highlighted in the section on OP50 growth conditions is that the initial preference for PA14 in the Murphy lab has also been observed by multiple other labs (Bargmann, Kim, Zhang, Abbalay). The fact that this preference was not observed by the Hunter lab is one of several indicators of subtle differences in the environment that might add up to explain the differences in results.

      We agree that subtle known and unknown differences in OP50 and PA14 culture conditions can have measurable effects on the detection of PA14 attraction/aversion relative to OP50 attraction/aversion that could obscure or create the appearance of heritable effects between generations. We have added (see below) to the text a fuller description of the variability in the initial or naive preference observed in different laboratories using similar or variant 2-choice assays and culture conditions. It is worth emphasizing that direct comparison of the OP50 growth conditions specified in Moore et al., (2021) frequently revealed a much larger effect on the naïve choice index than is reported between labs (Figure 4).  

      “Naïve (OP50 grown) worms often show a bias towards PA14 in choice assays (Zhang et al., 2005; Ha et al., 2010; Moore et al., 2019; Pereira et al., 2020; Lalsiamthara and Aballay, 2022). This response, rather than representing an innate attraction to PA14, likely reflects the context of the worm's recent growth on OP50, a mild C. elegans pathogen (Garigan et al., 2002; Garsin et al., 2003; Shi et al., 2006). Thus, the naïve worms presented with a choice between a recently experienced mild pathogen (OP50) and a novel food choice (PA14) initially choose the novel food instead of the known mild pathogen (OP50 aversion).

      In line with our results, some other groups have also reported higher naïve choice index scores (Lee et al., 2017). This variability in naïve choice may reflect differences in growth conditions of either the OP50 or PA14 bacteria. In addition, we note that among the studies that show naïve worm attraction to Pseudomonas (OP50 aversion) there are extensive methodological differences from the methods in Moore et al., (2019; 2021b), including differences in bacterial growth temperature, incubation time, whether the bacteria is diluted or concentrated prior to placement on the choice plates, the concentration of peptone in the choice plates, the length of the choice assay, and the inclusion of sodium azide in the choice assays (Zhang et al., 2005; Ha et al., 2010; Moore et al., 2019; Pereira et al 2020; Lalsiamthara and Aballay, 2022). Thus, the cause of the variability across published reports is not clear.”

      Overall, an emphasis on the absence of robustness of the reported results, rather than failure to reproduce them (which can always have many reasons), is appropriate.

      We agree that an emphasis on robustness is appropriate and have modified the text throughout the manuscript to shift the emphasis to absence of robustness. This includes a change to the manuscript title, which is now, “Reported transgenerational responses to Pseudomonas aeruginosa in C. elegans are not robust”

      A significant experimental addition would be some attempts to determine whether the bacterial PA14 pathogen in the authors' lab produces the P11 small RNA, which has been proposed to have a causal role in initiating the previously reported transgenerational inheritance.

      We acknowledge in the revised manuscript that a subsequent publication (Kaletsky et al., 2020) identified a correlation between PA14 training conditions that induced transgenerational memory and the expression of P11, a P. aeruginosa small non-coding RNA (see our response above to Reviewer #2’s similar query). While testing for the presence of P11 in Harvard culture conditions would be an important assay in any study whose purpose was to investigate the proposed P11-mediated mechanism underlying the transgenerational responses reported by the Murphy Lab, our goal was rather to replicate the robust transgenerational (F2) responses to PA14 training and then to investigate in more detail how sid-1 and sid-2 contribute to transgenerational epigenetic inheritance. Neither sid-1 nor sid-2 are predicted to transport small RNAs or single-stranded RNAs, thus testing for the presence of P11 is less relevant to our goals. Regardless, we note that Figure 3L in Kaletsky et al., (2020) showed that PA14 ΔP11 bacteria failed to induce an F1 avoidance response. Thus, the fact that we observed F1 avoidance implies that our culture conditions successfully induced P11 expression.

      Reviewer #1 (Recommendations For The Authors):

      The abstract could be more positive by concluding that 'We conclude that this example of transgenerational inheritance lacks robustness but instead reflects an example of small RNA-mediated intergenerational inheritance.'

      As recommended, we have added additional clarifying information to the abstract and moderated the conclusion sentence.

      “We did confirm that the dsRNA transport proteins SID-1 and SID-2 are required for the intergenerational (F1) inheritance of pathogen avoidance, but not for the F1 inheritance of elevated daf-7 expression. Furthermore, our reanalysis of RNA seq data provides additional evidence that this intergenerational inherited PA14 response may be mediated by small RNAs.”

      “We conclude that this example of transgenerational inheritance lacks robustness, confirm that the intergenerational avoidance response, but not the elevated daf-7p::gfp expression in F1 progeny, requires sid-1 and sid-2, and identify candidate siRNAs and target genes that may mediate this intergenerational response.”

      Differential expression of sRNAs or mRNAs might be better understood quantitatively by presenting data in scatterplots (Reed and Montgomery 2020) rather than in volcano plots.

      We agree and have modified Figure 6A and 6B.

      This statement in the main text might be unnecessary, as it affects the tenor of the conclusion of this significant manuscript. 'We note that none of the raw data for the published figures and unpublished replicate experiments . . . this hampered our ability to fully compare'.

      We have rewritten this paragraph to focus on our goal: to identify the source of the discrepancy between our results and the published results. We considered discarding this statement but ultimately decided that our inability to directly compare our data to that of previously published work is a shortcoming of our study that deserves to be acknowledged and explained.

      “Ideally, we would have compared our results with the published results (Moore et al., 2019), to possibly identify additional experimental parameters for further investigation; for example, a quantitative comparison of naïve choice in the P0 and F1 generations could help to determine the role of bacterial growth in the choice assay response. However, none of the raw data for the published figures and unpublished replicate experiments (Moore et al., 2019) were available on the publisher’s website or provided upon request to the corresponding author. In the absence of a quantitative comparison, it remains possible that an explanation for the discrepancies between our results and those of Moore et al., (2019) has been overlooked.”

      The final sentence of the Discussion could be tempered and more positive by stating 'Thus independent reproducibility is of paramount concern, and we have tried to be completely transparent as a model for how heritability research should be conducted within the C. elegans community'.

      Thank you. The suggested sentence nicely captures our intention. We now use it, almost verbatim, as our final sentence.

      “Thus, independent reproducibility is of paramount concern, and we have tried to be completely transparent as a model for how heritability research should be presented within the C. elegans community.”

      Reviewer #2 (Recommendations For The Authors):

      Specific comments:

      (1) Protocol: It is difficult to assess from the Methods the exact protocol used by the authors to assay food preference. The annotated Murphy protocol is not sufficient. The authors should provide their own protocol - a detailed lab-ready protocol where every step is outlined, and any steps that deviate from the Murphy lab protocol are called out.

      Thank you for this excellent suggestion. We now include a protocol that documents the precise steps, timings, and controls that we followed (S1_aversion_protocol). We also include footnotes to both explain the reasons behind particular steps and to document known differences to the published protocol. Given the thoroughness of this suggested approach, we have thus removed the annotated version of Moore et al., (2021) from the revised submission.

      (2) The authors imply in the methods that, unlike the Murphy lab, they did NOT use azide in the assay, and instead used 4oC to "freeze" the worms in place - It is not clear whether this method was used throughout all their assays and whether this could be a source of the difference. This change is NOT indicated in the annotated Murphy lab STAR Protocol they provide in the supplement.

      We apologize for the lack of clarity. Concerned that azide may be interfering with our ability to detect heritable silencing we tested and then used cold-induced rigor to preserve worm choice in some choice assay results. This was not a change to the core protocol, but a variation used in some assays to determine whether azide could reduce our ability to detect heritable behavioral responses to PA14 exposure. As Moore et al., (2021) show, too much azide can affect measurement of worm choice. Too little or ineffective azide also can affect measurement of worm choice. Azide also affects bacteria (both OP50 and PA14), which could affect the production of molecules that attract or repel worms, much like performing the assay in light vs dark conditions can influence the measured choice index.

      In our hands, cold-induced rigor worked well and within biological replicates was indistinguishable from azide (Figure S10). Thus, we include those results in our analysis and now indicate in Tables 2 and S2 and in Figures 1 and 3 which experiments used which method. As suggested, we now provide a detailed protocol that includes a note describing our precise method for cold-induced rigor.

      Also, the number of worms used in each assay needs to be specified (same or different from Murphy protocol?), and whether any worms were "censored" as in the Murphy protocol, and if so on what basis.

      While we published the exact number of worms scored in each assay (on each plate) it is unknown how this might compare to the results published in Moore et al., (2019), as the number of animals in the presented choice assays (either per plate or per choice) were not reported. Details on censoring, when to exclude data, and additional criteria to abandon an in-progress experiment are now detailed in the protocol (S1_aversion_protocol)

      (3) Several instances in the text cite changes in the protocol as producing "no meaningful differences" without referring to a specific experiment that supports that statement (for example, line 399 regarding azide).

      We now include data and methods comparing azide and cold-induced rigor (Supplemental document S1_aversion_protocol, Supplemental Figure S10), and data showing the P0 choice index for 48-52 hour post-bleach L4/young adults (Supplemental Figure S1), in addition to the previously noted absence of effects due to differences in embryo bleaching protocols (Figures 2, 3 and Tables 1, 2, S1, and S2).

      (4) If the authors want to claim the irreproducibility of the Murphy lab results, they should use the exact protocol used by the Murphy lab in its entirety. It is not sufficient to show that individual changes do not affect the outcome, since the protocol they use appears to include SEVERAL changes which could cumulatively affect the results. If the authors do not want to do this, they should at least acknowledge and summarize in their discussion ALL their protocol changes.

      We acknowledge these minor differences between the protocols we followed and the published methods but disagree that they invalidate our results. We transparently present the effect of known minimal protocol changes. We also present analysis of possible invalidating variations (number of animals in a choice assay). We emphasize that in our hands both measures of TEI, the choice assay and measurement of daf-7p::gfp in ASI neurons, failed to replicate the published transgenerational results.

      If the protocol is sensitive to how animals are counted, whether bleached embryos are mixed gently or vigorously or a few hours difference in age at training, then in our view this TEI paradigm is not robust.

      See also our response to reviewer #3’s public reviews above.

      (5) The authors acknowledge that "non-obvious growth culture differences" could account for the different results. In this respect, the Murphy lab has proposed that the transgenerational effect requires a small RNA expressed in PA14. The authors should check that this RNA is expressed in the cultures they grow in their lab and use for their experiments. This could potentially identify where the two protocols diverge.

      The bacterial culture conditions and worm training procedures described in Moore et al., (2019) successfully produced trained P0 animals that transmitted a PA14 aversion response to their F1 progeny. In a subsequent publication (Kaletsky et al., 2020), the Murphy lab showed a correlation between the culture conditions that induce heritable avoidance and the expression of P11, a P. aeruginosa small non-coding RNA. As mentioned above in response to Reviewer #2’s public review and the Reviewing Editor’s comments to authors, the Murphy lab showed that PA14 ΔP11 bacteria fail to induce an F1 avoidance response (Figure 3L in Kaletsky et al., (2020)). Thus, the fact that we observed F1 avoidance implies that our culture conditions successfully induced P11 expression. We believe that this addresses the concern raised here. Furthermore, if P11 is not reliably expressed in pathogenic PA14, then the published model is unlikely to be relevant in a natural environment. Again, we thank the reviewer for raising this issue and have added this information to the revised manuscript (see above response to Reviewer #2’s Public Reviews).

      (6) Legend to Figure 1: please clarify which experiments were done with which PA14 isolates especially for A-C. What is the origin of the N2 strain used here?

      These details from Tables 2 and S2 have been added to Figure 1 panels A-C and Figure 3. Bristol N2, obtained from the CGC (reference 257), was used for aversion experiments.

      (7) Growth conditions: "These young adults produced comparable P0 and F1 results (Figure 1, Figure 2, and Figure 3)." It is not clear from the text what specific figure panels need to be compared to examine the effect of the variables described in the text. Please indicate which figure panels should be compared (lines 70-95).

      The information for the daf-7p::gfp expression experiments displayed in Figure 1 and Figure 2 is presented in Table 1 and Table S1. The data for P0 aversion training using younger animals is now presented in Figure S1.

      Reviewer #3 (Recommendations For The Authors):

      While overall I found this easy to follow and well-written, I think the clarity of the figures could be improved by incorporating some of the information from S2 into Figure 3. Besides the figure label listing the experiment (Exp1, Exp2, etc) it would be helpful to add pertinent information about the experiment. For example Exp 1.1 (light, 20{degree sign}C), Exp1.2 (dark, 20{degree sign}C), Exp 5 (25{degree sign}C, light), etc.

      Thank you for the suggestion. These details from Tables 2 and S2 have been added to Figures 1 A-C, and 3.

      Citations

      • Moore, R.S., Kaletsky, R., and Murphy, C.T. (2019). Piwi/PRG-1 Argonaute and TGF-beta Mediate Transgenerational Learned Pathogenic Avoidance. Cell 177, 1827-1841 e1812.

      • Moore, R.S., Kaletsky, R., and Murphy, C.T. (2021). Protocol for transgenerational learned pathogen avoidance behavior assays in Caenorhabditis elegans. STAR Protoc 2, 100384.

      • Kaletsky, R., Moore, R.S., Vrla, G.D., Parsons, L.R., Gitai, Z., and Murphy, C.T. (2020). C. elegans interprets bacterial non-coding RNAs to learn pathogenic avoidance. Nature 586, 445-451.

      • Pereira, A.G., Gracida, X., Kagias, K., and Zhang, Y. (2020). C. elegans aversive olfactory learning generates diverse intergenerational effects. J Neurogenet 34, 378-388.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Chen and colleagues investigated ZC3H11A as a potential cause of high myopia (HM) in humans through the analysis of exome sequencing in 1,015 adolescents and experiments involving Zc3h11a knock-out mice. The authors showed four possibly pathogenic missense variants in four adolescents with HM. After that, the authors presented the phenotypic features of Zc3h11a knock-out mice, the result of RNA-sequencing, and a comparison of mRNA and protein levels of the functional candidates between wild-type and Zc3h11a knock-out mice. Based on their observations, the authors concluded that ZC3H11A protein contributes to the early onset of myopia.

      The strengths of this manuscript include: (1) successful identification of characteristic ophthalmic phenotypes in Zc3h11a knock-out mice, (2) demonstration of biological features related to myopia, such as PI3K-AKT and NF-kB pathways, and (3) inclusion of supporting human genetic data in individuals with HM. On the other hand, the weaknesses of this paper appear to be: (1) the lack of robust evidence from their genomic analysis, and (2) insufficient evidence to support phenotypic similarity between humans with ZC3H11A mutations and Zc3h11a knock-out mice. Given that the biological mechanisms of high myopia are not fully understood, the identification of a novel gene is valuable. As described in the manuscript, it is worth noting that the previous study using myopic mouse model has implicated the role of ZC3H11A in the etiology of myopia (Fan et al. Plos Genet 2012).

      Thank you very much for your valuable suggestions.

      Specific comments:

      (1) I am concerned about the certainty of similarity in phenotypes between individuals with ZC3H11A mutation and Zc3h11a knock-out mice. A crucial point would be that there are no statistical differences in axial lengths (ALs) between wild-type and Zc3h11a knock-out mice at 8W and 10W, even though ALs in the individuals with ZC3H11A mutation were long. I would also like to note that the phenotypic information of these individuals is not available in the manuscript, although the authors indicated the suppressed b-wave amplitude in Zc3h11a knock-out mice. Considering that the authors described that "Detailed ophthalmic examinations were performed (lines: 321-323)", the detailed clinical features of these individuals should be included in the manuscript.

      Thank you for your valuable comments. The axial length in Zc3h11a Het-KO mice were found to be significantly greater than in WT littermates at weeks 4 and 6 (Independent samples t-test, p<0.05; Figure 2A and B). Although no significant differences were observed at other time points, there was still some degree of increase in these parameters. We continued to measure corneal curvature and found no significant differences between the two groups. Therefore, the difference in refraction may be due to the small size of the mouse eye. A 1 D change in refraction corresponds to only a 5-6 μm change in AL(1). However, the SD-OCT resolution used in this study is relatively low (theoretical resolution of 6 μm)(2, 3), so the small changes measured in vitreous cavity depth and AL may not be statistically significant. Additionally, some studies have shown that axial lengths reported in frozen sections are longer than those measured in vivo for age-matched mice(1, 4). Another possible explanation is that the curvature and refractive power of the lens have changed. These hypotheses provide a reasonable explanation for the mismatch between changes in refraction and ocular length parameters.

      Reference

      (1) Schmucker C, Schaeffel F. A paraxial schematic eye model for the growing C57BL/6 mouse. Vision research 44, 1857-1867 (2004).

      (2) Yuan Y, Chen F, Shen M, Lu F, Wang J. Repeated measurements of the anterior segment during accommodation using long scan depth optical coherence tomography. Eye & contact lens 38, 102-108 (2012).

      (3) Shen M, et al. SD-OCT with prolonged scan depth for imaging the anterior segment of the eye. Ophthalmic Surgery, Lasers and Imaging Retina 41, S65-S69 (2010).

      (4) Schmucker C, Schaeffel F. In vivo biometry in the mouse eye with low coherence interferometry. Vision research 44, 2445-2456 (2004).

      Additionally, regarding the “detailed ophthalmic examinations”, due to our patients were selected from a myopia screening cohort of over one million (children and adolescents myopia survey [CAMS] program), and ophthalmic examination only includes semi-annual refractive error measurements (a total of 5 times, with refractive error being the average of the three maximum values) and only one axial length measurement. The inappropriate description of “Detailed clinical features” has been removed.

      (2) The term "pathogenic variant" should be used cautiously. Please clarify the pathogenicity of the reported variants in accordance with the ACMG guideline.

      Thank you for your valuable comments. Four missense mutations in the ZC3H11A gene (c.412G>A, p.V138I; c.128G>A, p.G43E; c.461C>T, p.P154L; and c.2239T>A, p.S747T) were identified in the 1015 HM patients aged from 15 to 18 years. All of the identified mutations exhibited very low frequencies or does not exist in the Genome Aggregation Database (gnomAD) and Clinvar, and using pathogenicity prediction software SIFT, PolyPhen2, and CADD, most of them display high pathogenicity levels. Among them, c.412G>A, c.128G>A and c.461C>T were located in or around a domain named zf-CCCH_3 (Figure 1A and B). Furthermore, all of the mutation sites were located in highly conserved amino acids across different species (Figure 1C). Four mutations resulted in a higher degree of conformational flexibility and altered the negative charge at the corresponding sites (Figure 1D and E). Meanwhile, through transfection of overexpression mutant plasmids, it was found that compared to the wild-type, the mRNA expression levels of IκBα in the nucleus of all four mutant types (ZC3H11A<sup>V138I</sup>, ZC3H11A<sup>G43E</sup>, ZC3H11A<sup>P154L</sup> and ZC3H11A<sup>S747T</sup>) were significantly reduced (Supplement Figure 3). According to the ACMG guidelines, the above mutations can be classified as “Pathogenic Moderate”.

      (3) The genetic analysis does not fully support the claim that ZC3H11A is causative for HM. While the authors showed the rare allele frequencies and high CADD scores (> 20) of the identified variants, these were insufficient to establish causality. A helpful way to assess the causality would be performing a segregation analysis. An alternative approach is to show significant association by performing a gene-level association test. Assessing the pathogenicity of the variants using various prediction software, such as SIFT, PolyPhen2, and REVEL may also provide additional supportive evidence.

      Thank you for your valuable comments. We have addad the pathogenicity of the variants using various prediction software, such as SIFT, PolyPhen2, CADD, and the population variation databases, such as Genome Aggregation Database (gnomAD_AF) and ClinVar. Meanwhile, through transfection of overexpression mutant plasmids, it was found that compared to the wild-type, the mRNA expression levels of IκBα in the nucleus of all four mutant types (ZC3H11A<sup>V138I</sup>, ZC3H11A<sup>G43E</sup>, ZC3H11A<sup>P154L</sup> and ZC3H11A<sup>S747T</sup>) were significantly reduced (Supplement Figure 3).

      (4) As shown in Figure 2, significant differences in refraction were observed from 4 weeks to 10 weeks. Nevertheless, no differences were observed in AL, anterior/vitreous chamber depth, and lens depth. The author should experimentally clarify what factors contribute to the observed difference in refraction.

      Thank you for your valuable comments. The existing data show significant differences in refraction between 4 and 10 weeks, with the AL and vitreous cavity depth of Het mice being longer than those of WT mice at 4 and 6 weeks. Although no significant differences were observed at other time points, there was still some degree of increase in these parameters. We continued to measure corneal curvature and found no significant differences between the two groups. Therefore, the difference in refraction may be due to the small size of the mouse eye. A 1 D change in refraction corresponds to only a 5-6 μm change in AL(1). However, the SD-OCT resolution used in this study is relatively low (theoretical resolution of 6 μm)(2, 3), so the small changes measured in vitreous cavity depth and AL may not be statistically significant. Additionally, some studies have shown that axial lengths reported in frozen sections are longer than those measured in vivo for age-matched mice(1, 4). Another possible explanation is that the curvature and refractive power of the lens have changed. These hypotheses provide a reasonable explanation for the mismatch between changes in refraction and ocular length parameters.

      Reference

      (1) Schmucker C, Schaeffel F. A paraxial schematic eye model for the growing C57BL/6 mouse. Vision research 44, 1857-1867 (2004).

      (2) Yuan Y, Chen F, Shen M, Lu F, Wang J. Repeated measurements of the anterior segment during accommodation using long scan depth optical coherence tomography. Eye & contact lens 38, 102-108 (2012).

      (3) Shen M, et al. SD-OCT with prolonged scan depth for imaging the anterior segment of the eye. Ophthalmic Surgery, Lasers and Imaging Retina 41, S65-S69 (2010).

      (4) Schmucker C, Schaeffel F. In vivo biometry in the mouse eye with low coherence interferometry. Vision research 44, 2445-2456 (2004).

      (5) The gene names should be italicized throughout the manuscript.

      Thank you for your valuable comments. The gene names have been italicized throughout the manuscript.

      (6) Table 1: providing chromosomal positions and rs numbers (if available) would be helpful for readers.

      Thank you for your valuable comments. We have provided the chromosome positions and rs number (if available) of each mutation in Table 1.

      (7) Figure 5b, c, and d: the results of pathway analysis and GO enrichment analysis are difficult to interpret due to the small font size. It would be preferable to present these results in tables. Moreover, the authors should set a significant threshold in the enrichment analyses.

      Thank you for your valuable comments. We have adjusted the font size of the image. In the retina transcriptome analysis, we have set Fold change (FC) of at least two and a P value < 0.05 as thresholds to analyze differentially expressed genes (DEGs). The GO terms and KEGG pathways enrichment analysis selected the top 20 with the most significant differences or the highest number of enriched genes for display.

      Reviewer #2 (Public Review):

      Summary: Chong Chen and colleagues reported that mutations were identified in the ZC3H11A gene in four adolescents from 1015 high myopia subjects in their myopia cohort. They further generated Zc3h11a knockout mice utilizing the CRISPR/Cas9 technology. They analyzed the heterozygotes knockout mice compared to control littermates and found refractive error changes, electrophysiological differences, and retinal inflammation-related gene expression differences. They concluded that ZC3H11A may play a role in the early onset of myopia by regulating inflammatory responses.

      Strengths:

      Data were shown from both clinical cohort and animal models.

      Weaknesses:

      Their findings are interesting and important, however; they need to resolve several points to make the current conclusion.

      (1) They described the ZC3H11A gene as a pathogenic variant for high myopia. It should be classified as pathogenic according to the guidelines of the American College of Medical Genetics and Genomics (Richards et al., Genet Med 17(5):405-24, 2015). The modes of inheritance for the families need to be shown. They also described identifying the gene as a "new" candidate. It should be checked in databases such as gnomAD and ClinVar, and any previous publications and be declared as a novel variant.

      Thank you for your valuable comments. Four missense mutations in the ZC3H11A gene (c.412G>A, p.V138I; c.128G>A, p.G43E; c.461C>T, p.P154L; and c.2239T>A, p.S747T) were identified in the 1015 HM patients aged from 15 to 18 years. All of the identified mutations exhibited very low frequencies or does not exist in the Genome Aggregation Database (gnomAD) and Clinvar, and using pathogenicity prediction software SIFT, PolyPhen2, and CADD, most of them display high pathogenicity levels. Among them, c.412G>A, c.128G>A and c.461C>T were located in or around a domain named zf-CCCH_3 (Figure 1A and B). Furthermore, all of the mutation sites were located in highly conserved amino acids across different species (Figure 1C). Four mutations resulted in a higher degree of conformational flexibility and altered the negative charge at the corresponding sites (Figure 1D and E). Meanwhile, through transfection of overexpression mutant plasmids, it was found that compared to the wild-type, the mRNA expression levels of IκBα in the nucleus of all four mutant types (ZC3H11A<sup>V138I</sup>, ZC3H11A<sup>G43E</sup>, ZC3H11A<sup>P154L</sup> and ZC3H11A<sup>S747T</sup>) were significantly reduced (Supplement Figure 3). According to the ACMG guidelines, the above mutations can be classified as “Pathogenic Moderate”.

      Unfortunately, our patients are part of the MAGIC project (aged 15 years or older), a cohort consists of thousands of individuals with HM (patients from the children and adolescents myopia survey [CAMS] program) who have undergone WES, and their parents' relevant information was not collected for performing a segregation analysis.

      (2) The phenotypes of the heterozygote mice are weak overall. The het mice showed mild to moderate myopic refractive shifts from 4 to 10 weeks of age. However, this cannot be explained by other ocular biometrics such as anterior chamber depth or lens thickness. Some differences are found between het and WT littermates in axial length and vitreous chamber depth but disappear after 8 weeks old. Furthermore, the early differences are not enough to explain the refractive error changes. They mentioned that they did not use homozygotes because of the embryonic lethality. I would strongly suggest employing conditional knockout systems to analyze homozygotes. This will also be able to identify the causative tissues/cells because they assume bipolar cells are functional. The cells in the retinal pigment epithelium and choroid are also important to contribute to myopia development.

      Thank you for your valuable comments. The existing data show significant differences in refraction between 4 and 10 weeks, with the AL and vitreous cavity depth of Het mice being longer than those of WT mice at 4 and 6 weeks. Although no significant differences were observed at other time points, there was still some degree of increase in these parameters. We continued to measure corneal curvature and found no significant differences between the two groups. Therefore, the difference in refraction may be due to the small size of the mouse eye. A 1 D change in refraction corresponds to only a 5-6 μm change in AL(1). However, the SD-OCT resolution used in this study is relatively low (theoretical resolution of 6 μm)(2, 3), so the small changes measured in vitreous cavity depth and AL may not be statistically significant. Additionally, some studies have shown that axial lengths reported in frozen sections are longer than those measured in vivo for age-matched mice(1, 4). Another possible explanation is that the curvature and refractive power of the lens have changed. These hypotheses provide a reasonable explanation for the mismatch between changes in refraction and ocular length parameters.

      Reference

      (1) Schmucker C, Schaeffel F. A paraxial schematic eye model for the growing C57BL/6 mouse. Vision research 44, 1857-1867 (2004).

      (2) Yuan Y, Chen F, Shen M, Lu F, Wang J. Repeated measurements of the anterior segment during accommodation using long scan depth optical coherence tomography. Eye & contact lens 38, 102-108 (2012).

      (3) Shen M, et al. SD-OCT with prolonged scan depth for imaging the anterior segment of the eye. Ophthalmic Surgery, Lasers and Imaging Retina 41, S65-S69 (2010).

      (4) Schmucker C, Schaeffel F. In vivo biometry in the mouse eye with low coherence interferometry. Vision research 44, 2445-2456 (2004).

      The drawback is that, we did not conduct relevant research on homozygous knockout mice. The first reason is that our patient's mutation pattern is heterozygous mutation (Heterozygous knockout mice can better simulate human phenotypes). The second reason is that homozygous knockout mice are lethal, and we did not use the conditional knockout mouse model for further research. At the same time, we limited the pathway of myopia to the recognized and classical retina-sclera pathway, and did not study other pathways such as retinal pigment epithelium and choroid.

      (3) Their hypothesis regarding inflammatory gene changes and myopic development is not logical. Are the inflammatory responses evoked from bipolar cells? Did the mice show an accumulation of inflammatory cells in the inner retina? Visible retinal inflammation is not generally seen in either early-onset or high-myopia human subjects. Can this be seen in the actual subjects in the cohort? To me, this is difficult to adapt the retina-to-sclera signaling they mentioned in the discussion so far. Egr-1 may be examined as described.

      Thank you for your valuable comments. We have removed the hypothesis regarding inflammatory gene changes and myopic development. At present, the explanation is based solely on the correlation of signal pathways, the theoretical basis comes from the reference literature:

      “Lin et al., Role of Chronic Inflammation in Myopia Progression: Clinical Evidence and Experimental Validation. EBioMedicine, 2016 Aug:10:269-81, Figure 7.”

      Reviewer #3 (Public Review):

      Chen et al have identified a new candidate gene for high myopia, ZC3H11A, and using a knock-out mouse model, have attempted to validate it as a myopia gene and explain a potential mechanism. They identified 4 heterozygous missense variants in highly myopic teenagers. These variants are in conserved regions of the protein, but the authors provide no evidence that these specific variants affect protein function. They then created a knock-out mouse. Heterozygotes show myopia at all ages examined but increased axial length only at very early ages. Unfortunately, the authors do not address this point or examine corneal structure in these animals. They show that the mice have decreased B-wave amplitude on electroretinogram (a sign of retinal dysfunction associated with bipolar cells), and decreased expression of a bipolar cell marker, PKCa. They do not address, however, whether there are fewer bipolar cells, or simply decreased expression of the marker protein. On electron microscopy, there are morphologic differences in the outer nuclear layer (where bipolar, amacrine, and horizontal cell bodies reside). Transcriptome analysis identified over 700 differentially expressed genes. The authors chose to focus on the PI3K-AKT and NF-kB signaling pathways and show changes in the expression of genes and proteins in those pathways, including PI3K, AKT, IkBa, NF-kB, TGF-b1, MMP-2, and IL-6, although there is very high variability between animals. They propose that myopia may develop in these animals either as a result of visual abnormality (decreased bipolar cell function in the retina) or by alteration of NF-kB signaling. These data provide an interesting new candidate variant for the development of high myopia, and provide additional data that MMP2 and IL6 have a role in myopia development, but do not support the claim of the title that myopia is caused by an inflammatory reaction.

      Thank you for your valuable comments. Four missense mutations in the ZC3H11A gene (c.412G>A, p.V138I; c.128G>A, p.G43E; c.461C>T, p.P154L; and c.2239T>A, p.S747T) were identified in the 1015 HM patients aged from 15 to 18 years. All of the identified mutations exhibited very low frequencies or does not exist in the Genome Aggregation Database (gnomAD) and Clinvar, and using pathogenicity prediction software SIFT, PolyPhen2, and CADD, most of them display high pathogenicity levels. Among them, c.412G>A, c.128G>A and c.461C>T were located in or around a domain named zf-CCCH_3 (Figure 1A and B). Furthermore, all of the mutation sites were located in highly conserved amino acids across different species (Figure 1C). Four mutations resulted in a higher degree of conformational flexibility and altered the negative charge at the corresponding sites (Figure 1D and E). Meanwhile, through transfection of overexpression mutant plasmids, it was found that compared to the wild-type, the mRNA expression levels of IκBα in the nucleus of all four mutant types (ZC3H11A<sup>V138I</sup>, ZC3H11A<sup>G43E</sup>, ZC3H11A<sup>P154L</sup> and ZC3H11A<sup>S747T</sup>) were significantly reduced (Supplement Figure 3). According to the ACMG guidelines, the above mutations can be classified as “Pathogenic Moderate”.

      The existing data show significant differences in refraction between 4 and 10 weeks, with the AL and vitreous cavity depth of Het mice being longer than those of WT mice at 4 and 6 weeks. Although no significant differences were observed at other time points, there was still some degree of increase in these parameters. We continued to measure corneal curvature and found no significant differences between the two groups. Therefore, the difference in refraction may be due to the small size of the mouse eye. A 1 D change in refraction corresponds to only a 5-6 μm change in AL(1). However, the SD-OCT resolution used in this study is relatively low (theoretical resolution of 6 μm)(2, 3), so the small changes measured in vitreous cavity depth and AL may not be statistically significant. Additionally, some studies have shown that axial lengths reported in frozen sections are longer than those measured in vivo for age-matched mice(1, 4). Another possible explanation is that the curvature and refractive power of the lens have changed. These hypotheses provide a reasonable explanation for the mismatch between changes in refraction and ocular length parameters.

      To evaluate the change in the number of a specific type of retinal cells, the most commonly used experimental method involves staining with antibodies specific to the target cell type, followed by fluorescence microscopy. The fluorescence intensity or the number of cells can be analyzed semi-quantitatively to assess the changes in the specific cell type in the retina. For example, in retinal degenerative models, rhodopsin-specific staining is used to identify the loss of rod cells. In our study, we selected PCK-α as a marker protein for bipolar cells to assess their number. Additionally, transmission electron microscopy (TEM) was used to observe damage to the cell morphology in the inner nuclear layer (INL) of Het mice, where bipolar cell bodies are located. Based on both sets of data, we conclude that bipolar cells have indeed undergone structural damage and a reduction in number.

      Reference

      (1) Schmucker C, Schaeffel F. A paraxial schematic eye model for the growing C57BL/6 mouse. Vision research 44, 1857-1867 (2004).

      (2) Yuan Y, Chen F, Shen M, Lu F, Wang J. Repeated measurements of the anterior segment during accommodation using long scan depth optical coherence tomography. Eye & contact lens 38, 102-108 (2012).

      (3) Shen M, et al. SD-OCT with prolonged scan depth for imaging the anterior segment of the eye. Ophthalmic Surgery, Lasers and Imaging Retina 41, S65-S69 (2010).

      (4) Schmucker C, Schaeffel F. In vivo biometry in the mouse eye with low coherence interferometry. Vision research 44, 2445-2456 (2004).

      We have removed the hypothesis regarding inflammatory gene changes and myopic development. At present, the explanation is based solely on the correlation of signal pathways, the theoretical basis comes from the reference literature:

      “Lin et al., Role of Chronic Inflammation in Myopia Progression: Clinical Evidence and Experimental Validation. EBioMedicine, 2016 Aug:10:269-81, Figure 7.”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this detailed study, Cohen and Ben-Shaul characterized the AOB cell responses to various conspecific urine samples in female mice across the estrous cycle. The authors found that AOB cell responses vary with the strains and sexes of the samples. Between estrous and non-estrous females, no clear or consistent difference in responses was found. The cell response patterns, as measured by the distance between pairs of stimuli, are largely stable. When some changes do occur, they are not consistent across strains or male status. The authors concluded that AOB detects the signals without interpreting them. Overall, this study will provide useful information for scientists in the field of olfaction.

      Strengths:

      The study uses electrophysiological recording to characterize the responses of AOB cells to various urines in female mice. AOB recording is not trivial as it requires activation of VNO pump. The team uses a unique preparation to activate the VNO pump with electric stimulation, allowing them to record AOB cell responses to urines in anesthetized animals. The study comprehensively described the AOB cell responses to social stimuli and how the responses vary (or not) with features of the urine source and the reproductive state of the recording females. The dataset could be a valuable resource for scientists in the field of olfaction.

      Weaknesses:

      (1) The figures could be better labeled.

      Figures will be revised to provide more detailed labeling.

      (2) For Figure 2E, please plot the error bar. Are there any statistics performed to compare the mean responses?

      We did not perform statistical comparisons (between the mean rates across the population). We will add this analysis and the corresponding error bars. 

      (3) For Figure 2D, it will be more informative to plot the percentage of responsive units.

      We will do it.

      (4) Could the similarity in response be explained by the similarity in urine composition? The study will be significantly strengthened by understanding the "distance" of chemical composition in different urine.

      We agree. As we wrote in the Discussion: “Ultimately, lacking knowledge of the chemical space associated with each of the stimuli, this and all the other ideas developed here remain speculative.”

      A better understanding of the chemical distance is an important aspect that we aim to include in our future studies. However, this is far from trivial, as it is not chemical distance per se (which in itself is hard to define), but rather the “projection” of chemical space on the vomeronasal receptor neurons array. That is, knowledge of the chemical composition of the stimuli, lacking full knowledge of which molecules are vomeronasal system ligands, will only provide a partial picture. Despite these limitations, this is an important analysis which we would have done had we access to this data.

      (5) If it is not possible for the authors to obtain these data first-hand, published data on MUPs and chemicals found in these urines may provide some clues.

      Measurements about some classes of molecules may be found for some of the stimuli that we used here, but not for all. We are not aware of any single dataset that contains this information for any type of molecules (e.g., MUPs) across the entire stimulus set that we have used. More generally, pooling results from different studies has limited validity because of the biological and technical variability across studies. In order to reliably interpret our current recordings, it would be necessary to measure the urinary content of the very same samples that were used for stimulation. Unfortunately, we are not able to conduct this analysis at this stage.

      (6) It is not very clear to me whether the female overrepresentation is because there are truly more AOB cells that respond to females than males or because there are only two female samples but 9 male samples.

      It is true that the number of neurons fulfilling each of the patterns depends on the number of individual stimuli that define it. However, our measure of “over-representation” aims to overcome this bias, by using bootstrapping to reveal if the observed number of patterns is larger than expected by chance. We also note that more generally, the higher frequency of responses to female, as compared to male stimuli, is obtained in other studies by others and by us, also when the number of male and female stimuli is matched (e.g., Bansal et al BMC Biol 2021, Ben-Shaul et al, PNAS 2010, Hendrickson et al, JNS, 2008).

      (7) If the authors only select two male samples, let's say ICR Naïve and ICR DOM, combine them with responses to two female samples, and do the same analysis as in Figure 3, will the female response still be overrepresented?

      We believe that the answer is positive, but we can, and will perform this analysis to check.

      (8) In Figure 4B and 4C, the pairwise distance during non-estrus is generally higher than that during estrus, although they are highly correlated. Does it mean that the cells respond to different urines more distinctively during diestrus than in estrus?

      This is an important observation. For the Euclidean distance there might be a simple explanation as the distance depends on the number of units (and there are more units recorded in non-estrus females). However, this simple explanation does not hold for the correlation distance. A higher distance implies higher discrimination during the non-estrus stage, but our other analyses of sparseness and the selectivity indices do not support this idea. We note that absolute values of distance measures should generally be interpreted cautiously, as they may depend on multiple factors including sample size. Also, a small number of non-selective units could increase the correlation in responses among stimuli, and thus globally shift the distances. For these reasons, we focus on comparisons, rather than the absolute values of the correlation distances. In the revised manuscript, we will note and discuss this important observation.

      (9) The correlation analysis is not entirely intuitive when just looking at the figures. Some sample heatmaps showing the response differences between estrous states will be helpful.

      If we understand correctly, the idea is to show the correlation matrices from which the values in 4B and 4C are taken. We can and will do this, probably as a supplementary figure.

      Reviewer #2 (Public review):

      Summary:

      Many aspects of the study are carefully done, and in the grand scheme this is a solid contribution. I have no "big-picture" concerns about the approach or methodology. However, in numerous places the manuscript is unnecessarily vague, ambiguous, or confusing. Tightening up the presentation will magnify their impact.

      We will revise the text with the aim of tightening the presentation.

      Strengths:

      (1) The study includes urine donors from males of three strains each with three social states, as well as females in two states. This diversity significantly enhances their ability to interpret their results.

      (2) Several distinct analyses are used to explore the question of whether AOB MCs are biased towards specific states or different between estrus and non-estrus females. The results of these different analyses are self-reinforcing about the main conclusions of the study.

      (3) The presentation maintains a neutral perspective throughout while touching on topics of widespread interest.

      Weaknesses:

      (1) Introduction:

      The discussion of the role of the VNS and preferences for different male stimuli should perhaps include Wysocki and Lepri 1991

      Agreed. we will refer to this work in our discussion.

      (2) Results:

      a) Given the 20s gap between them, the distinction between sample application and sympathetic nerve trunk stimulation needs to be made crystal clear; in many places, "stimulus application" is used in places where this reviewer suspects they actually mean sympathetic nerve trunk stimulation.

      In this study, we have considered both responses that are triggered by sympathetic trunk activation, and those that occur (as happens in some preparations) immediately following stimulus application (and prior to nerve trunk stimulation). An example of the latter Is provided in the second unit shown in Figure 1D (and this is indicated also in the figure legend). In our revision, we will further clarify this confusing point.

      b) There appears to be a mismatch between the discussion of Figure 3 and its contents. Specifically, there is an example of an "adjusted" pattern in 3A, not 3B.

      True. Thanks for catching this error. We will correct this.

      c) The discussion of patterns neglects to mention whether it's possible for a neuron to belong to more than one pattern. For example, it would seem possible for a neuron to simultaneously fit the "ICR pattern" and the "dominant adjusted pattern" if, e.g., all ICR responses are stronger than all others, but if simultaneously within each strain the dominant male causes the largest response.

      This is true. In the legend to Figure 3B, we actually write: “A neuron may fulfill more than one pattern and thus may appear in more than one row.”, but we will discuss this point in the main text as well.

      (3) Discussion:

      a) The discussion of chemical specificity in urine focuses on volatiles and MUPs (citation #47), but many important molecules for the VNS are small, nonvolatile ligands. For such molecules, the corresponding study is Fu et al 2015.

      We fully agree. We will expand our discussion and refer to Fu et al.

      b) "Following our line of reasoning, this scarcity may represent an optimal allocation of resources to separate dominant from naïve males": 1 unit out of 215 is roughly consistent with a single receptor. Surely little would be lost if there could be more computational capacity devoted to this important axis than that? It seems more likely that dominance is computed from multiple neuronal types with mixed encoding.

      We agree, and we are not claiming that dominance, nor any other feature, is derived using dedicated feature selective neurons.  Our discussion of resource allocation is inevitably speculative. Our main point in this context is that a lack of overrepresentation does not imply that a feature is not important. We will revise our discussion to better clarify our view of this issue.

      (4) Methods:

      a) Male status, "were unambiguous in most cases": is it possible to put numerical estimates on this? 55% and 99% are both "most," yet they differ substantially in interpretive uncertainty.

      This sentence is actually misleading and irrelevant. Ambiguous cases were not considered as dominant for urine collection. We only classified mice as dominant if they were “won” in the tube test and exhibited dominant behavior in the subsequent observation period in the cage. We will correct the wording in the revised manuscript.

      b) Surgical procedures and electrode positioning: important details of probes are missing (electrode recording area, spacing, etc).

      True. We will add these details.

      c) Stimulus presentation procedure: Are stimuli manually pipetted or delivered by apparatus with precise timing?

      They are delivered manually. We will clarify this as well.

      d) Data analysis, "we applied more permissive criteria involving response magnitude": it's not clear whether this is what's spelled out in the next paragraph, or whether that's left unspecified. In either case, the next paragraph appears to be about establishing a noise floor on pattern membership, not a "permissive criterion."

      True, the next paragraph is not the explanation for the more permissive criteria. The more permissive criteria involving response magnitude are actually those described in Figure 3A and 3B. The sentence that was quoted above merely states that before applying those criteria, we had also searched for patterns defined by binary designation of neurons as responsive, or not responsive, to each of the stimuli (this is directly related to the next comment below). Using those binary definitions, we obtained a very small number of neurons for each pattern and thus decided to apply the approach actually used and described in the manuscript.

      e) Data analysis, method for assessing significance: there's a lot to like about the use of pooling to estimate the baseline and the use of an ANOVA-like test to assess unit responsiveness.

      But:

      i) for a specific stimulus, at 4 trials (the minimum specified in "Stimulus presentation procedure") kruskalwallis is questionable. They state that most trials use 5, however, and that should be okay.

      The number of cases with 4 trials is truly a minority, and we will provide the exact numbers in our revision.

      ii) the methods statement suggests they are running kruskalwallis individually for each neuron/stimulus, rather than once per neuron across all stimuli. With 11 stimuli, there is a substantial chance of a false-positive if they used p < 0.05 to assess significance. (The actual threshold was unstated.) Were there any multiple comparison corrections performed? Or did they run kruskalwallis on the neuron, and then if significant assess individual stimuli? (Which is a form of multiple-comparisons correction.)

      First, we indeed failed to mention that our criterion was 0.05. We will correct that in our revision. We did not apply any multiple comparison measures. We consider each neuron-stimulus pair as an independent entity, and we are aware that this leads to a higher false positive rate. On the other hand, applying multiple comparisons would be problematic, as we do not always use the same number of stimuli in different studies. Applying multiple comparison corrections would lead to different response criteria across different studies. Notably, most, if not all, of our conclusions involve comparisons across conditions, and for this purpose we think that our procedure is valid. We do not attach any special meaning to the significance threshold, but rather think of it as a basic criterion that allows us to exclude non-responsive neurons, and to compare frequencies of neurons that fulfill this criterion.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study by Pinho et al. presents a novel behavioral paradigm for investigating higher-order conditioning in mice. The authors developed a task that creates associations between light and tone sensory cues, driving mediated learning. They observed sex differences in task acquisition, with females demonstrating faster-mediated learning compared to males. Using fiber photometry and chemogenetic tools, the study reveals that the dorsal hippocampus (dHPC) plays a central role in encoding mediated learning. These findings are crucial for understanding how environmental cues, which are not directly linked to positive/negative outcomes, contribute to associative learning. Overall, the study is well-designed, with robust results, and the experimental approach aligns with the study's objectives.

      Strengths:

      (1) The authors develop a robust behavioral paradigm to examine higher-order associative learning in mice.

      (2) They discover a sex-specific component influencing mediated learning, with females exhibiting enhanced learning abilities.

      (3) Using fiber photometry and chemogenetic techniques, the authors identify the dorsal hippocampus but not the ventral hippocampus, which plays a crucial for encoding mediated learning.

      Weaknesses:

      (1) The study would be strengthened by further elaboration on the rationale for investigating specific cell types within the hippocampus.

      We will add more information to better explain the rationale of our experiments and/or manipulations.

      (2) The analysis of photometry data could be improved by distinguishing between early and late responses, as well as enhancing the overall presentation of the data.

      We will provide new photometry analysis to differentiate between early and late responses during stimuli presentations.

      (3) The manuscript would benefit from revisions to improve clarity and readability.

      We will improve the clarity and readability of our manuscript.

      Reviewer #2 (Public review):

      Summary:

      Pinho et al. developed a new auditory-visual sensory preconditioning procedure in mice and examined the contribution of the dorsal and ventral hippocampus to learning in this task. Using photometry they observed activation of the dorsal and ventral hippocampus during sensory preconditioning and conditioning. Finally, the authors combined their sensory preconditioning task with DREADDs to examine the effect of inhibiting specific cell populations (CaMKII and PV) in the DH on the formation and retrieval/expression of mediated learning.

      Strengths:

      The authors provide one of the first demonstrations of auditory-visual sensory preconditioning in male mice. Research on the neurobiology of sensory preconditioning has primarily used rats as subjects. The development of a robust protocol in mice will be beneficial to the field, allowing researchers to take advantage of the many transgenic mouse lines. Indeed, in this study, the authors take advantage of a PV-Cre mouse line to examine the role of hippocampal PV cells in sensory preconditioning.

      Weaknesses:

      (1) The authors report that sensory preconditioning was observed in both male and female mice. However, their data only supports sensory preconditioning in male mice. In female mice, both paired and unpaired presentations of the light and tone in stage 1 led to increased freezing to the tone at test. In this case, fear to the tone could be attributed to factors other than sensory preconditioning, for example, generalization of fear between the auditory and visual stimulus.

      To address the pertinent doubt raised by the reviewer, we will perform new experiments to generate a new unpaired group in female mice through the increase of the temporal interval between light and tone exposure during the preconditioning phase. We believe this new results will bring additional information to better understand the performance of female mice in sensory preconditioning.

      (2) In the photometry experiment, the authors report an increase in neural activity in the hippocampus during both phase 1 (sensory preconditioning) and phase 2 (conditioning). In the subsequent experiment, they inhibit neural activity in the DH during phase 1 (sensory preconditioning) and the probe test, but do not include inhibition during phase 2 (conditioning). It was not clear why they didn't carry forward investigating the role of the hippocampus during phase 2 conditioning. Sensory preconditioning could occur due to the integration of the tone and shock during phase two, or retrieval and chaining of the tone-light-shock memories at test. These two possibilities cannot be differentiated based on the data. Given that we do not know at which stage the mediate learning is occurring, it would have been beneficial to additionally include inhibition of the DH during phase 2.

      We will perform new experiments to generate novel data by inhibiting the CamK-positive neurons of the dorsal hippocampus during the conditioning phase.

      (3) In the final experiment, the authors report that inhibition of the dorsal hippocampus during the sensory preconditioning phase blocked mediated learning. While this may be the case, the failure to observe sensory preconditioning at test appears to be due more to an increase in baseline freezing (during the stimulus off period), rather than a decrease in freezing to the conditioned stimulus. Given the small effect, this study would benefit from an experiment validating that administration of J60 inhibited DH cells. Further, given that the authors did not observe any effect of DREADD inhibition in PV cells, it would also be important to validate successful cellular silencing in this protocol.

      By combining chemogenetic and fiber photometry approaches, we will perform a control experiments to demonstrate that our chemogenetic experiments are decreasing CAMK- or PV-dependent activity in dorsal and ventral hippocampus.

      Reviewer #3 (Public review):

      Summary:

      Pinho et al. investigated the role of the dorsal vs ventral hippocampus and the gender differences in mediated learning. While previous studies already established the engagement of the hippocampus in sensory preconditioning, the authors here took advantage of freely-moving fiber photometry recording and chemogenetics to observe and manipulate sub-regions of the hippocampus (dorsal vs. ventral) in a cell-specific manner. The authors first found sex differences in the preconditioning phase of a sensory preconditioning procedure, where males required more preconditioning training than females for mediating learning to manifest, and where females displayed evidence of mediated learning even when neutral stimuli were never presented together within the session.

      After validation of a sensory preconditioning procedure in mice using light and tone neutral stimuli and a mild foot shock as the unconditioned stimulus, the authors used fiber photometry to record from all neurons vs. parvalbumin_positive_only neurons in the dorsal hippocampus or ventral hippocampus of male mice during both preconditioning and conditioning phases. They found increased activity of all neurons, as well as PV+_only neurons in both sub-regions of the hippocampus during both preconditioning and conditioning phases. Finally, the authors found that chemogenetic inhibition of CaMKII+ neurons in the dorsal, but not ventral, hippocampus specifically prevented the formation of an association between the two neutral stimuli (i.e., light and tone cues), but not the direct association between the light cue and the mild foot shock. This set of data: (1) validates the mediated learning in mice using a sensory preconditioning protocol, and stresses the importance of taking sex effect into account; (2) validates the recruitment of dorsal and ventral hippocampi during preconditioning and conditioning phases; and (3) further establishes the specific role of CaMKII+ neurons in the dorsal but not ventral hippocampus in the formation of an association between two neutral stimuli, but not between a neutral-stimulus and a mild foot shock.

      Strengths:

      The authors developed a sensory preconditioning procedure in mice to investigate mediated learning using light and tone cues as neutral stimuli, and a mild foot shock as the unconditioned stimulus. They provide evidence of a sex effect in the formation of light-cue association. The authors took advantage of fiber-photometry and chemogenetics to target sub-regions of the hippocampus, in a cell-specific manner and investigate their role during different phases of a sensory conditioning procedure.

      Weaknesses:

      The authors went further than previous studies by investigating the role of sub-regions of the hippocampus in mediated learning, however, there are several weaknesses that should be noted:

      (1) This work first validates mediated learning in a sensory preconditioning procedure using light and tone cues as neutral stimuli and a mild foot shock as the unconditioned stimulus, in both males and females. They found interesting sex differences at the behavioral level, but then only focused on male mice when recording and manipulating the hippocampus. The authors do not address sex differences at the neural level.

      As discussed above, we will perform additional experiment to evaluate the presence of a reliable sensory preconditioning in female mice. In addition, although observing sex differences at the neural level can be very interesting, we think that it is out of the scope of the present work. However, we will mention this issue/limitation in the Discussion in the new version of the manuscript.

      (2) As expected in fear conditioning, the range of inter-individual differences is quite high. Mice that didn't develop a strong light-->shock association, as evidenced by a lower percentage of freezing during the Probe Test Light phase, should manifest a low percentage of freezing during the Probe Test Tone phase. It would interesting to test for a correlation between the level of freezing during mediated vs test phases.

      We will provide correlations between the behavioral responses in both probe tests.

      (3) The use of a synapsin promoter to transfect neurons in a non-specific manner does not bring much information. The authors applied a more specific approach to target PV+ neurons only, and it would have been more informative to keep with this cell-specific approach, for example by looking also at somatostatin+ inter-neurons.

      We will better justify the use of specific promoters and the targeting of PV-positive neurons. We will also add discussion on potential interesting future experiments such as the targeting of other GABAergic subtypes.

      (4) The authors observed event-related Ca2+ transients on hippocampal pan-neurons and PV+ inter-neurons using fiber photometry. They then used chemogenetics to inhibit CaMKII+ hippocampal neurons, which does not logically follow. It does not undermine the main finding of CaMKII+ neurons of the dorsal, but not ventral, hippocampus being involved in the preconditioning, but not conditioning, phase. However, observing CaMKII+ neurons (using fiber photometry) in mice running the same task would be more informative, as it would indicate when these neurons are recruited during different phases of sensory preconditioning. Applying then optogenetics to cancel the observed event-related transients (e.g., during the presentation of light and tone cues, or during the foot shock presentation) would be more appropriate.

      We will perform new experiments to analyze the activity of CAMK-positive neurons during light-tone associations during the preconditioning phase in male mice.

      (5) Probe tests always start with the "Probe Test Tone", followed by the "Probe Test Light". "Probe Test Tone" consists of an extinction session, which could affect the freezing response during "Probe Test Light" (e.g., Polack et al. (http://dx.doi.org/10.3758/s13420-013-0119-5)). Preferably, adding a group of mice with a Probe Test Light with no Probe Test Tone could help clarify this potential issue. The authors should at least discuss the possibility that the tone extinction session prior to the "Probe Test Light" could have affected the freezing response to the light cue.

      We will add discussion on this issue raised by the reviewer.

      Reviewer #4 (Public review):

      Summary

      Pinho et al use in vivo calcium imaging and chemogenetic approaches to examine the involvement of hippocampal sub-regions across the different stages of a sensory preconditioning task in mice. They find clear evidence for sensory preconditioning in male but not female mice. They also find that, in the male mice, CaMKII-positive neurons in the dorsal hippocampus: (1) encode the audio-visual association that forms in stage 1 of the task, and (2) retrieve/express sensory preconditioned fear to the auditory stimulus at test. These findings are supported by evidence that ranges from incomplete to convincing. They will be valuable to researchers in the field of learning and memory.

      Abstract

      Please note that sensory preconditioning doesn't require the stage 1 stimuli to be presented repeatedly or simultaneously.

      We will correct this wrong sentence in the abstract.

      "Finally, we combined our sensory preconditioning task with chemogenetic approaches to assess the role of these two hippocampal subregions in mediated learning."

      This implies some form of inhibition of hippocampal neurons in stage 2 of the protocol, as this is the only stage of the protocol that permits one to make statements about mediated learning. However, it is clear from what follows that the authors interrogate the involvement of hippocampal sub-regions in stages 1 and 3 of the protocol - not stage 2. As such, most statements about mediated learning throughout the paper are potentially misleading (see below for a further elaboration of this point). If the authors persist in using the term mediated learning to describe the response to a sensory preconditioned stimulus, they should clarify what they mean by mediated learning at some point in the introduction. Alternatively, they might consider using a different phrase such as "sensory preconditioned responding".

      Through the text, we will avoid the term “mediated learning” and we will replace it with more accurate terms. In addition, we will interrogate the role of dHPC in Stage 2 as commented above.

      Introduction

      "Low-salience" is used to describe stimuli such as tone, light, or odour that do not typically elicit responses that are of interest to experimenters. However, a tone, light, or odour can be very salient even though they don't elicit these particular responses. As such, it would be worth redescribing the "low-salience" stimuli in some other terms.

      We will substitute “low-salience” for “innocuous”.

      "These higher-order conditioning processes, also known as mediated learning, can be captured in laboratory settings through sensory preconditioning procedures2,6-11."

      Higher-order conditioning and mediated learning are not interchangeable terms: e.g., some forms of second-order conditioning are not due to mediated learning. More generally, the use of mediated learning is not necessary for the story that the authors develop in the paper and could be replaced for accuracy and clarity. E.g., "These higher-order conditioning processes can be studied in the laboratory using sensory preconditioning procedures2,6-11."

      Through the text, we will avoid the term “mediated learning” and we will replace it with more accurate terms.

      In reference to Experiment 2, it is stated that: "However, when light and tone were separated on time (Unpaired group), male mice were not able to exhibit mediated learning response (Figure 2B) whereas their response to the light (direct learning) was not affected (Figure 2D). On the other hand, female mice still present a lower but significant mediated learning response (Figure 2C) and normal direct learning (Figure 2E). Finally, in the No-Shock group, both male (Figure 2B and 2D) and female mice (Figure 2C and 2E) did not present either mediated or direct learning, which also confirmed that the exposure to the tone or light during Probe Tests do not elicit any behavioral change by themselves as the presence of the electric footshock is required to obtain a reliable mediated and direct learning responses."<br /> The absence of a difference between the paired and unpaired female mice should not be described as "significant mediated learning" in the latter. It should be taken to indicate that performance in the females is due to generalization between the tone and light. That is, there is no sensory preconditioning in the female mice. The description of performance in the No-shock group really shouldn't be in terms of mediated or direct learning: that is, this group is another control for assessing the presence of sensory preconditioning in the group of interest. As a control, there is no potential for them to exhibit sensory preconditioning, so their performance should not be described in a way that suggests this potential.

      We will re-write the text to clarify the right comments raised by the Reviewer.

      Methods - Behavior

      I appreciate the reasons for testing the animals in a new context. This does, however, raise other issues that complicate the interpretation of any hippocampal engagement: e.g., exposure to a novel context may engage the hippocampus for exploration/encoding of its features - hence, it is engaged for retrieving/expressing sensory preconditioned fear to the tone. This should be noted somewhere in the paper given that one of its aims is to shed light on the broader functioning of the hippocampus in associative processes.

      We will further discuss this aspect on the manuscript.

      This general issue - that the conditions of testing were such as to force engagement of the hippocampus - is amplified by two further features of testing with the tone. The first is the presence of background noise in the training context and its absence in the test context. The second is the fact that the tone was presented for 30 s in stage 1 and then continuously for 180s at test. Both changes could have contributed to the engagement of the hippocampus as they introduce the potential for discrimination between the tone that was trained and tested.

      We will consider the aspect raised by the reviewer on the manuscript.

      Results - Behavior

      The suggestion of sex differences based on differences in the parameters needed to generate sensory preconditioning is interesting. Perhaps it could be supported through some set of formal analyses. That is, the data in supplementary materials may well show that the parameters needed to generate sensory preconditioning in males and females are not the same. However, there needs to be some form of statistical comparison to support this point. As part of this comparison, it would be neat if the authors included body weight as a covariate to determine whether any interactions with sex are moderated by body weight.

      We will add statistical comparisons between male and female mice.

      What is the value of the data shown in Figure 1 given that there are no controls for unpaired presentations of the sound and light? In the absence of these controls, the experiment cannot have shown that "Female and male mice show mediated learning using an auditory-visual sensory preconditioning task" as implied by its title. Minimally, this experiment should be relabelled.

      We will relabel Figure 1.

      "Altogether, this data confirmed that we successfully set up an LTSPC protocol in mice and that this behavioral paradigm can be used to further study the brain circuits involved in higher-order conditioning."

      Please insert the qualifier that LTSPC was successfully established in male mice. There is no evidence of LTSPC in female mice.

      We will generate new experiments to try to demonstrate that SPC can be also observed in female mice.

      Results - Brain

      "Notably, the inhibition of CaMKII-positive neurons in the dHPC (i.e. J60 administration in DREADD-Gi mice) during preconditioning (Figure 4B), but not before the Probe Test 1 (Figure 4B), fully blocked mediated, but not direct learning (Figure 4D)."

      The right panel of Figure 4B indicates no difference between the controls and Group DPC in the percent change in freezing from OFF to ON periods of the tone. How does this fit with the claim that CaMKII-positive neurons in the dorsal hippocampus regulate associative formation during the session of tone-light exposures in stage 1 of sensory preconditioning?

      We will rephrase and add more Discussion regarding this section of the results to stick to what the graphs are showing. We will clarify that the group where dHPC activity is inhibited during preconditioning is the only one where the % of change is not significantly different from 0 (compared to the control or the group where the dHPC activity was modulated during the test).

      Discussion

      "When low salience stimuli were presented separated on time or when the electric footshock was absent, mediated and direct learning were abolished in male mice. In female mice, although light and tone were presented separately during the preconditioning phase, mediated learning was reduced but still present, which implies that female mice are still able to associate the two low-salience stimuli."

      This doesn't quite follow from the results. The failure of the female unpaired mice to withhold their freezing to the tone should not be taken to indicate the formation of a light-tone association across the very long interval that was interpolated between these stimulus presentations. It could and should be taken to indicate that, in female mice, freezing conditioned to the light simply generalized to the tone (i.e., these mice could not discriminate well between the tone and light).

      We will rewrite this part depending on the results observed in female mice.

      "Indeed, our data suggests that when hippocampal activity is modulated by the specific manipulation of hippocampal subregions, this brain region is not involved during retrieval."

      Does this relate to the results that are shown in the right panel of Figure 4B, where there is no significant difference between the different groups? If so, how does it fit with the results shown in the left panel of this figure, where differences between the groups are observed?

      We will re-write it to clearly describe our results and we will also revise all the statistical analysis.

      "In line with this, the inhibition of CaMKII-positive neurons from the dorsal hippocampus, which has been shown to project to the restrosplenial cortex56, blocked the formation of mediated learning."

      Is this a reference to the findings shown in Figure 4B and, if so, which of the panels exactly? That is, one panel appears to support the claim made here while the other doesn't. In general, what should the reader make of data showing the percent change in freezing from stimulus OFF to stimulus ON periods?

      We will rewrite the text to clearly describe our results, and we will also revise all the statistical analysis. In addition, we will better explain the data showing the % of change.

    1. Author response:

      Many thanks for assessing our submission. We are grateful for the reviews and recommendations that will inform a revised version of the paper, which will include additional data and modified text to take into account the reviewers’ comments.

      We appreciate Reviewer #1’s suggestion regarding the use of mutational work to demonstrate that collagen binding is indeed dependent on the T-shaped fold. However, we believe that this approach is neither feasible nor necessary for our study. Instead, we propose to measure collagen binding to a monomeric form of M3, which preserves all residues including the ones involved in binding, but cannot form the T-shaped structure. This will achieve the same as unravelling the T fold through mutations, but at the same time removes the risk of directly affecting binding through altering residues that are involved in both binding and definition of the T fold.

      Structural biology is by its nature observational, which is not a limitation but the very purpose of this approach. Our study goes beyond observing structures. We identify a critical residue within a previously mapped binding site, and demonstrate through mutagenesis a causal link between presence of this residue on a tertiary fold and collagen binding activity. We will firm up our mutational experiments with a characterisation of the M3 Tyr96 variants to confirm that these mutations did not affect the overall fold. We further demonstrate that the interaction between M3 and collagen promotes biofilm formation as observed in patient biopsies and a tissue model of infection. We show that other streptococci, that do not possess a surface protein presenting collagen binding sites like M3, do not form collagen-dependent biofilm. We therefore do not think that criticising our study for being almost entirely observational is justified. 

      We thank Reviewer #2 for the thorough analysis of our reported findings. The main criticism here concerns the question if binding of emm3 streptococci would differ for different types of collagen. We will address this point in the revised manuscript. Our collagen peptide binding assays together with the structural data identify the collagen triple helix as the binding site for M3. While collagen types differ in their functions and morphology in various tissues, they all have in common triple-helical tropocollagen regions (with very high sequence similarity) that are non-specifically recognised by M3. Therefore, our data in conjunction with the body of published work showing binding of M3 to collagens I, II, III and IV suggest it is highly likely that emm3 streptococci will indeed bind to many if not all types of collagen in the same manner. Whether this means all collagen types, in the various tissues where they occur, are targeted by emm3 streptococci is a very interesting question, however one that goes beyond the scope of our study.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This work considers the biases introduced into pathogen surveillance due to congregation effects, and also models homophily and variants/clades. The results are primarily quantitative assessments of this bias but some qualitative insights are gained e.g. that initial variant transmission tends to be biased upwards due to this effect, which is closely related to classical founder effects.

      Strengths:

      The model considered involves a simplification of the process of congregation using multinomial sampling that allows for a simpler and more easily interpretable analysis.

      Weaknesses:

      This simplification removes some realism, for example, detailed temporal transmission dynamics of congregations.

      We appreciate Reviewer #1's comments. We hope our framework, like the classic SIR model, can be adapted in the future to build more complex and realistic models.

      Reviewer #2 (Public review):

      Summary:

      In "Founder effects arising from gathering dynamics systematically bias emerging pathogen surveillance" Bradford and Hang present an extension to the SIR model to account for the role of larger than pairwise interactions in infectious disease dynamics. They explore the impact of accounting for group interactions on the progression of infection through the various sub-populations that make up the population as a whole. Further, they explore the extent to which interaction heterogeneity can bias epidemiological inference from surveillance data in the form of IFR and variant growth rate dynamics. This work advances the theoretical formulation of the SIR model and may allow for more realistic modeling of infectious disease outbreaks in the future.

      Strengths:

      (1) This work addresses an important limitation of standard SIR models. While this limitation has been addressed previously in the form of network-based models, those are, as the authors argue, difficult to parameterize to real-world scenarios. Further, this work highlights critical biases that may appear in real-world epidemiological surveillance data. Particularly, over-estimation of variant growth rates shortly after emergence has led to a number of "false alarms" about new variants over the past five years (although also to some true alarms).

      (2) While the results presented here generally confirm my intuitions on this topic, I think it is really useful for the field to have it presented in such a clear manner with a corresponding mathematical framework. This will be a helpful piece of work to point to to temper concerns about rapid increases in the frequency of rare variants.

      (3) The authors provide a succinct derivation of their model that helps the reader understand how they arrived at their formulation starting from the standard SIR model.

      (4) The visualizations throughout are generally easy to interpret and communicate the key points of the authors' work.

      (5) I thank the authors for providing detailed code to reproduce manuscript figures in the associated GitHub repo.

      Weaknesses:

      (1) The authors argue that network-based SIR models are difficult to parameterize (line 66), however, the model presented here also has a key parameter, mainly P_n, or the distribution of risk groups in the population. I think it is important to explore the extent to which this parameter can be inferred from real-world data to assess whether this model is, in practice, any easier to parameterize.

      (2) The authors explore only up to four different risk groups, accounting for only four-wise interactions. But, clearly, in real-world settings, there can be much larger gatherings that promote transmission. What was the justification for setting such a low limit on the maximum group size? I presume it's due to computational efficiency, which is understandable, but it should be discussed as a limitation.

      (3) Another key limitation that isn't addressed by the authors is that there may be population structure beyond just risk heterogeneity. For example, there may be two separate (or, weakly connected) high-risk sub-groups. This will introduce temporal correlation in interactions that are not (and can not easily be) captured in this model. My instinct is that this would dampen the difference between risk groups shown in Figure 2A. While I appreciate the authors's desire to keep their model relatively simple, I think this limitation should be explicitly discussed as it is, in my opinion, relatively significant.

      We appreciate Reviewer 2's thoughtful comments and wish to address some of the weaknesses:

      We agree that inferring P_n from real data will be challenging, but think this is an important direction for future research. Further, we’d like to reframe our claim that our approach is "easier to parameterize" than network models. Rather, P_n has fewer degrees of freedom than analogous network models, just as many different networks can share the same degree distribution. Fewer degrees of freedom mean that we expect our model to suffer from fewer identifiability issues when fitting to data, though non-identifiability is often inescapable in models of this nature (e.g., \beta and \gamma in the SIR model are not uniquely identifiable during exponential growth). Whether this is more or less accurate is another question. Classic bias-variance tradeoffs argue that a model with a moderate complexity trained on one data set can better fit future data than overly simple or overly complex models.

      We chose four risk groups for purposes of illustration, but this can be increased arbitrarily. It should be noted that the simulation bottleneck when increasing the numbers of risk groups is numerical due the stiffness of the ODEs. This arises because the nonlinearity of infection terms scales with the number of risk groups (e.g., ~ \beta * S * I^3 for 4 risk groups). As such, a careful choice of numerical solvers may be required when integrating the ODEs. Meanwhile, this is not an issue for stochastic, individual based implementation (e.g., Gillespie). As for how well this captures super-spreading, we believe choosing smaller risk groups does not hinder modeling disease spread at large gatherings. Consider a statistical interpretation, where individuals at a large gathering engage in a series of smaller interactions over time (e.g., 2/3/4/etc person conversations). The key determinants of the resulting gathering size distribution at any one large gathering are the number of individuals within some shared proximity over time and the infectiousness/dispersal of the pathogen. Of course, whether this interpretation is a sufficient approximation for classic super-spreading events (e.g., funerals during 2014-2015 West Africa Ebola outbreak) is a matter of debate. Our framework is best interpreted at a population level where the effects of any single gathering are washed out by the overall gathering distribution, P_n. As the prior weakness highlighted, establishing P_n is challenging, but we believe empirically measuring proxies of it may provide future insight in how behavior impacts disease spread. For example, prior work has combined contact tracing and co-location data from connection to WiFi networks to estimate the distribution of contacts per individual, and its degree of overdispersion (Petros et al. Med 2022).

      We chose to introduce our framework in a simple SIR context familiar to many readers. This decision does not in any way limit applying it to settings with more population structure. Rather, we believe our framework is easily adaptable and that our presentation (hopefully) makes it clear how to do this. For example, two weakly connected groups could be easily achieved by (for each gathering) first sampling the preferred group and then sampling from the population in a biased manner. The biased sampling could even be a function of gathering sizes, time, etc. The resulting infection terms are still (sums of) multinomials. More generally, the sampling probabilities for an individual of some type need not be its frequency (e.g., S/N, I/N). Indeed, we believe generating models with complex social interactions is both simplified and made more robust by focusing on modeling the generative process of attending gatherings.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate.

      Strengths

      (1) The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high-speed line scans to resolve changes with a spatial resolution of ~250 nm and a temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements.

      (2) The use of calcium indicators with very different affinities and different intracellular calcium buffers helps provide confirmation of key results.

      Thank you very much for this positive evaluation of our work.

      Weaknesses

      (1) Multiple key points of the paper lack statistical tests or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well.

      Thank you for this feedback. We will address this in our revised manuscript.

      (2) Figure 5 is confusing. The figure caption describes red, green, and blue traces, but the figure itself has only two traces in each panel and none are red, green, or blue. It's not possible currently to evaluate this figure.

      Thank you for pointing out this oversight. The figure indeed only shows the proximal and distal calcium signals, but not the cytoplasmic ones. The figure will be corrected in our revised manuscript.

      (3) The rise time measurements in Figure 2 are very different for low and high-affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different from the two indicators. That might suggest that the high-affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements.

      As we had mentioned in the text, we do believe that the high-affinity version is partially saturated. This will be a problem for strong depolarizations and signals near the membrane. The higher affinity indicators are more useful for reporting calcium levels on the ribbon after the depolarization when the signal from the low affinity indicators is small. We will address this in the discussion of the revision.

      Reviewer #2 (Public review):

      Summary:

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal.

      Strengths:

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM.

      Thank you very much for this appreciation.

      Weaknesses:

      Heterogeneity in the spatiotemporal dynamics of Ca2+ influx was not convincingly related to ribbon size, nor was the functional relevance of Ca2+ dynamics to rod bipolars demonstrated (e.g., exocytosis to different postsynaptic targets). In addition, the study would benefit from the inclusion of the Ca2+ currents that were recorded in parallel with the Ca2+ imaging.

      Thank you for this critique. We agree that the relationship between size and Ca2+ signal is not established by our recordings. By analogy to the hair cell literature, we believe that it is a reasonable hypothesis, but more studies will be necessary to definitively determine whether the signal relates to the ribbon size or synaptic signaling. This will be addressed in future experiments.

      We will include the Ca<sup>2+</sup> currents in the revision.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons, and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites.

      Strengths:

      The study is in principle technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging.

      Thank you very much for this appreciation.

      Weaknesses:

      Peptides may not be entirely specific, and the genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. I also feel that "Nano-physiology" is overselling, because the measured Ca is most likely the local average surrounding synaptic ribbons. With this approach, nobody knows about the real release site Ca or the Ca relevant for synaptic vesicle replenishment. It is rather "microdomain physiology" which measures the local Ca near synaptic ribbons, relatively large structures responsible for fusion, replenishment, and recycling of synaptic vesicles.

      The peptide approach has been used fairly extensively in the ribbon synapse field and the evidence that it efficiently labels the ribbon is well established, however, we do acknowledge that the peptide is in equilibrium with a cytoplasmic pool. Thus, some of the signal arises from this cytoplasmic pool. The alternative of a genetically encoded Ca-indicator concatenated to a ribbon protein would not have this problem, but would be more limited in flexibility in changing calcium indicators. We believe both approaches have their merits, each with separate advantages and disadvantages.

      As for the nano vs. micro argument, we certainly do not want to suggest that we are measuring the same nano-domains, in the 10s of nanometers, that drive neurotransmitter release, but we do believe we are in the sub-micrometer--100s of nm—range. We chose the term based on the usage by other authors to describe similar measurements (Neef et al., 2018; https://doi.org/10.1038/s41467-017-02612-y), but we see the reviewer’s point. To avoid confusion, we will change the title in the revision.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This retrospective study provides new data regarding the prevalence of pain in women with PCOS and its relationship with health outcomes. Using data from electronic health records (EHR), the authors found a significantly higher prevalence of pain among women with PCOS compared to those without the condition: 19.21% of women with PCOS versus 15.8% in non-PCOS women. The highest prevalence of pain was conducted among Black or African American (32.11%) and White (30.75%) populations. Besides, women with PCOS and pain have at least a 2-fold increased prevalence of obesity (34.68%) at baseline compared to women with PCOS in general (16.11%). Also, women with PCOS had the highest risk for infertility and T2D, but women with PCOS and pain had higher risks for ovarian cysts and liver disease. Regarding these results, the authors suggested the critical need to address pain in the diagnosis and management of PCOS due to its significant impact on patient health outcomes.

      Strengths:

      (1) The problem of pain assessment in PCOS patients is well described and the authors provided a clear rationale selection of the retrospective design to investigate this problem.(2) A large number of analyzed patient records (76,859,666 women) and their uniformity increases the power of the study. Using the Propensity Score Matching makes it possible to reduce the heterogeneity of the compared cohorts and the influence of comorbid conditions.(3) Analysis in different ethnic cohorts provides actual and necessary data regarding the prevalence of pain and its relationship with different health conditions that will be helpful for clinicians to make a diagnosis and manage PCOS in women of different ethnicities. (4) Assessment of the risk of different health conditions including PCOS-associated pathology as other common groups of diseases in PCOS women with or without pain allows to differentiate the risk of comorbid conditions depending on the presence of one symptom (pelvic or abdominal pain, dysmenorrhea).

      We appreciate the positive feedback on this manuscript. Pain assessment in women with PCOS is of paramount interest and because of a gap in this research area, we are trying to address it.

      Weaknesses:

      (1) Although the paper has strengths in methodology and data analysis, it also has some weaknesses.

      The lack of a hypothesis doesn't allow us to evaluate the aim and significance of this study.

      We would like to thank the Reviewer for their valuable feedback regarding the hypothesis of this study. We understand that the hypothesis may not have been written clearly under the objectives and we will correct this in the formal revision.

      The primary hypothesis of this study is that women with PCOS experience a higher prevalence to pain (including dysmenorrhea, abdominal pain and pelvic pain) compared to women without PCOS, and this prevalence varies by racial groups. Our hypothesis aims to explore the relationship between PCOS and pain, the associated health risks, and the potential racial disparities in pain prevalence and long-term health outcomes. Additionally, we seek to assess the effect of treatment on reducing pain symptoms in women with PCOS. This study not only examines the immediate burden of pain but also investigates its long-term consequences, including risks of infertility, obesity, and type 2 diabetes.

      To enhance clarity for readers, we will explicitly state this hypothesis in the revised manuscript and ensure that its connection to the study’s objectives is clearly articulated. We appreciate the Reviewer’s insights and will incorporate these refinements to strengthen the manuscript.

      (2) The exclusion criteria don't include conditions, that can lead to symptoms similar to PCOS: thyroid diseases, hyperprolactinemia, and congenital adrenal hyperplasia. Thyroid status is not being taken into account in the criteria for matching. All these conditions could occur as on prevalence results as on risk assessment.

      We would like to thank the Reviewer for highlighting the need to include these additional conditions that mimic PCOS. After excluding hypothyroidism, hyperprolactinemia, and adrenal hyperplasia from the PCOS and PCOS and pain cohorts, we observed that 7,690 patients (1.65%) with PCOS and 1,854 patients (1.36%) with PCOS were removed. Based on this observation, we plan to add these three conditions to our exclusion criteria and rerun our analysis for disease prevalence and relative risk for our resubmission.

      We will update the manuscript accordingly to reflect these exclusions and ensure clarity in our methodology. Additionally, we will discuss the rationale for excluding these conditions to improve transparency and provide a more precise interpretation of our findings.

      (3) The significant weakness of the study is the absence of a Latin American cohort. Probably the White cohort includes Latin Americans or others, but the results of the study cannot be extrapolated to particular White ethnicities.

      We appreciate the Reviewer’s suggestion to include Latin American cohorts in studies. In this paper we only used race as a variable and did not incorporate ethnicity. However, for our resubmission we plan to include self-reported ethnicity in our analysis which will capture the Latin American cohort stratified by self-reported race groups. This addition will provide a more comprehensive understanding of racial and ethnic differences in our study population, and we will update the manuscript accordingly to reflect this expansion.

      (4) The authors didn't provide sufficient rationale for future health outcomes and this list didn't include diseases of the digestive system or disorders of thyroid glands, which can also cause abdominal pain.

      We appreciate the Reviewer comment and understand their concern. Our current results highlight the prevalence of disorders of the digestive system in Figure 2 and in the results section. To further strengthen our analysis, we plan to include disorders of the digestive system in our relative risk (RR) assessment. However, we will not be able to include the same analysis for thyroid dysfunctions as they will be considered as an exclusion criterion. These updates will be incorporated into the revised manuscript to ensure clarity and completeness.

      Reviewer #2 (Public review):

      Summary:

      The study offers a thorough analysis of the prevalence of pain in women with polycystic ovary syndrome (PCOS) and its associations with health outcomes across various racial groups. Furthermore, the research investigates the prevalence of PCOS and pain among different racial demographics, as well as the increased risk of developing various conditions in comparison to individuals who have PCOS alone.

      Strengths:

      The study emphasizes pain as a significant comorbidity of PCOS, an area that is critically underexplored in existing literature. The findings regarding the increased prevalence of some of the diseases in the PCOS + pain group provide valuable direction for future research and clinical care. I believe physicians should incorporate pain score assessments into their clinical practice to improve patient's quality of life and raise awareness about pain management. If future research focuses on the mechanisms of pain, it would provide a better understanding of pain and allow for a focus on the underlying causes rather than just symptomatic management. The study also highlights the association between PCOS+pain and various comorbidities, such as obesity, hypertension, and type 2 diabetes, as well as conditions like infertility and ovarian cysts, offering a holistic view of the burden of PCOS.

      We sincerely appreciate the Reviewer’s insightful comments. We hope that our findings will encourage further research on the occurrence of pain in women with PCOS and that others will replicate our results to strengthen the evidence in this area. As noted in our introduction, there are currently no standardized abdominal pain score assessments specifically for women with PCOS. We hope that the findings from this study will contribute to efforts toward developing a standardized pain assessment for the PCOS community. In the meantime, further research across more diverse populations will be essential to build a more comprehensive understanding of this issue.

      Weaknesses:

      Due to the nature of the retrospective study, some data may not be readily available in the system. Instead of simply categorizing participants based on whether they experience pain, it would be more useful to employ a pain scale or questionnaire to better understand the severity and type of patients' pain. This approach would allow for a more thorough analysis of pain improvement following treatment with the three widely used medications for PCOS. Additionally, it would be beneficial for the authors to specify subtypes of the disease rather than generalizing conditions, such as mentioning specific digestive system disorders or mental health disorders. The lack of detailed analysis of specific disorders limits the depth of the findings. This may cause authors to make incorrect conclusions.

      We appreciate the Reviewer for highlighting the importance of categorizing pain levels experienced by women with PCOS. However, there is currently no standardized pain assessment for abdominal pain, and therefore more research is required before such a classification can be made. Additionally, the electronic health record data we leveraged via the TriNextX platform does not include any pain scale data from unstructured notes. Despite these limitations, this study is an important step toward recognizing abdominal and pelvic pain in women with PCOS. Our findings indicate that women with PCOS report abdominal pain independent of digestive conditions such as irritable bowel syndrome— a condition often associated with pain in this population.

      We would like to thank the Reviewer for their thoughtful comment with respect to subtyping the future health outcomes. To address this, we plan to include the most common diseases associated with PCOS for each general disease group as a supplemental figure in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public review):

      comment 1: Lu et al. use their workflow to visualize RNA expression of five enzymes that are each involved in the biosynthetic pathway of different neurotransmitters/modulators, namely chat (cholinergeric), gad (GABAergic), tbh (octopaminergic), th (dopaminergic), and tph (serotonergic). In this way, they generate an anatomical atlas of neurons that produce these molecules. Collectively these markers are referred to as the "neuronpool." They overstate when they write, "The combination of these five types of neurons constitutes a neuron pool that enables the labeling of all neurons throughout the entire body." This statement does not accurately represent the state of our knowledge about the diversity of neurons in S. mediterranea. There are several lines of evidence that support the presence of glutamatergic and glycinergic neurons, including the following. The glutamate receptor agonists NMDA and AMPA both produce seizure-like behaviors in S. mediterranea that are blocked by the application of glutamate receptor antagonists MK-801 and DNQX (which antagonize NMDA and AMPA glutamate receptors, respectively; Rawls et al., 2009). scRNA-Seq data indicates that neurons in S. mediterranea express a vesicular glutamate transporter, a kainite-type glutamate receptor, a glycine receptor, and a glycine transporter (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Two AMPA glutamate receptors, GluR1 and GluR2, are known to be expressed in the CNS of another planarian species, D. japonica (Cebria et al., 2002). Likewise, there is abundant evidence for the presence of peptidergic neurons in S. mediterranea (Collins et al., 2010; Fraguas et al., 2012; Ong et al., 2016; Wyss et al., 2022; among others) and in D. japonica (Shimoyama et al., 2016). For these reasons, the authors should not assume that all neurons can be assayed using the five markers that they selected. The situation is made more complex by the fact that many neurons in S. mediterranea appear to produce more than one neurotransmitter/modulator/peptide (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022), which is common among animals (Vaaga et al., 2014; Brunet Avalos and Sprecher, 2021). However the published literature indicates that there are substantial populations of glutamatergic, glycinergic, and peptidergic neurons in S. mediterranea that do not produce other classes of neurotransmission molecule (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Thus it seems likely that the neuronpool will miss many neurons that only produce glutamate, glycine or a neuropeptide.

      In response to your comments, we agree that our initial statement regarding the "neuron pool" overstated the extent of neuronal coverage provided by the five selected markers. We have revised the sentence as “The combination of these five types of neurons constitutes a neuron pool that enables the labeling of most of the neurons throughout the entire body, including the eyes, brain, and pharynx”.

      Furthermore, we chose the five neurotransmitter systems (cholinergic, GABAergic, octopaminergic, dopaminergic, and serotonergic) based on their well-characterized roles in planarian neurobiology and the availability of reliable markers. However, we acknowledge the limitations of this approach and recognize that it does not encompass all neuron types, particularly those involved in glutamatergic, glycinergic, and peptidergic signaling, which have been documented in S. mediterranea. We have also added the content about other neuron types in our revised results section “Additionally, the neuron system of S. mediterranea is complex which characterized by considerable diversity among glutamatergic, glycinergic, and peptidergic neurons in planarians and many neurons in S. mediterranea express more than one neurotransmitter or neuropeptide, which adds further complexity to the system. We used five markers for a proof of concept illustration. By employing Fluorescence in Situ Hybridization (FISH), we successfully visualized a variety of planarian neurons, including cholinergic (chat<sup>+</sup>), serotonergic (tph<sup>+</sup>), octopaminergic (tbh<sup>+</sup>), GABAergic (gad<sup>+</sup>), and dopaminergic (th<sup>+</sup>) neurons based on their well-characterized roles in planarian neurobiology and the availability of reliable markers. (Figure S2A, Supplemental video 2) (Currie et al., 2016). The combination of these five types of neurons constitutes a neuron pool that enables the labeling of most of the neurons throughout the entire body, including the eyes, brain, and pharynx (Figure 1B).”

      comment 2: The authors use their technique to image the neural network of the CNS using antibodies raised vs. Arrestin, Synaptotagmin, and phospho-Ser/Thr. They document examples of both contralateral and ipsilateral projections from the eyes to the brain in the optic chiasma (Figure 1C-F). These data all seem to be drawn from a single animal in which there appears to be a greater than normal number of nerve fiber defasciculatations. It isn't clear how well their technique works for fibers that remain within a nerve tract or the brain. The markers used to image neural networks are broadly expressed, and it's possible that most nerve fibers are too densely packed (even after expansion) to allow for image segmentation. The authors also show a close association between estrella-positive glial cells and nerve fibers in the optic chiasma.

      Thank you for your detailed feedback. While we did not perform segmentation of all neuron fibers, we were able to segment more isolated fibers that were not densely packed within the neural tracts. We use 120 nm resolution to segment neurons along the three axes. Our data show the presence of both contralateral and ipsilateral projections of visual neurons. Although Figure 1C-F shows data from one planarian, we imaged three independent specimens to confirm the consistency of these observations. In the revised manuscript, we have included a discussion on the limitations of TLSM in reconstructing neural networks. In the discussion part, we added “It should be noted that the current resolution for our segmentation may be limited when resolving fibers within densely packed regions of the nerve tracts”.

      comment 3: The authors count all cell types, neuron pool neurons, and neurons of each class assayed. They find that the cell number to body volume ratio remains stable during homeostasis (Figure S3C), and that the brain volume steadily increases with increasing body volume (Figure S3E). They also observe that the proportion of neurons to total body cells is higher in worms 2-6 mm in length than in worms 7-9 mm in length (Figure 2D, S3F). They find that the rate at which four classes of neurons (GABAergic, octopaminergic, dopaminergic, serotonergic) increase relative to the total body cell number is constant (Figure S3G-J). They write: "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." This conclusion should not be reached without first directly counting the number of cholinergic neurons and total body cells. Given that glutamatergic, glycinergic, and peptidergic neurons were not counted, it also remains possible that the non-linear dynamics are due (in part or in whole) to one or more of these populations.

      We have revised the statement into “These results suggest that the above observation of the non-linear dynamics between neuron and total cell number is not likely from the octopaminergic, GABAergic, dopaminergic, and serotonergic neurons. Since our neuron pool may not include glutamatergic, glycinergic, and peptidergic neurons, the non-linear dynamics may be from cholinergic neurons or other neurons not included in our staining.”

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The proprietary nature of the microscope, protected by a patent, limits the technical details provided, making the method hard to reproduce in other labs.

      Thank you for your comment. We understand the importance of reproducibility and transparency in scientific research. We would like to point out that the detailed design and technical specifications of the TLSM are publicly available in our published work: Chen et al., Cell Reports, 2020. Additionally, the protocol for C-MAP, including the specific experimental steps, is comprehensively described in the methods section of this paper. We believe that these resources should provide sufficient information for other labs to replicate the method.

      (2) The resolution of the analyses is mostly limited to the cellular level, which does not fully leverage the advantages of expansion microscopy. Previous applications of expansion microscopy have revealed finer nanostructures in the planarian nervous system (see Fan et al. Methods in Cell Biology 2021; Wang et al. eLife 2021). It is unclear whether the current protocol can achieve a comparable resolution.

      Thank you for raising this important point. The strength of our C-MAP protocol lies in its fluorescence-protective nature and user convenience. Notably, the sample can be expanded up to 4.5-fold linearly without the need for heating or proteinase digestion, which helps preserve fluorescence signals. In addition, the entire expansion process can be completed within 48 hours. While our current analysis focused on cellular-level structures, our method can achieve comparable or better resolution and we will add this information in the revised manuscript as “It is important to point out that the strength of our C-MAP protocol lies in its fluorescence-protective nature and user convenience. Notably, the sample can be expanded up to 4.5-fold linearly without the need for heating or proteinase digestion, which helps preserve fluorescence signals. In addition, the entire expansion process can be completed within 48 hours. Based on our research requirement, two spatial resolutions were adopted to image expanded planarians, 2×2×5 μm<sup>3</sup> and 0.5×0.5×1.6 μm<sup>3</sup>. The resolution can be further improved to 500 nm and 120 nm, respectively.”

      (3) The data largely corroborate past observations, while the novel claims are insufficiently substantiated.

      A few major issues with the claims:

      Line 303-304: While 6G10 is a widely used antibody to label muscle fibers in the planarian, it doesn't uniformly mark all muscle types (Scimone at al. Nature 2017). For a more complete view of muscle fibers, it is important to use a combination of antibodies targeting different fiber types or a generic marker such as phalloidin. This raises fundamental concerns about all the conclusions drawn from Figures 4 and 6 about differences between various muscle types. Additionally, the authors should cite the original paper that developed the 6G10 antibody (Ross et al. BMC Developmental Biology 2015).

      We appreciate the reviewer’s insightful comments and acknowledge that 6G10 does not uniformly label all muscle fiber types. We agree that this limitation should be recognized in the interpretation of our results. We have revised the manuscript to explicitly state the limitations of using 6G10 alone for muscle fiber labeling and highlight the need for additional markers. We have included the following statement in the Results section: “It is noted that previous studies reported that 6G10 does not label all body wall muscles equivalently with the limitation of predominantly labeling circular and diagonal fibers (Scimone et al., 2017; Ross et al., 2015). Our observation may be limited by this preference”. We would also clarify that the primary objective of our study was to demonstrate the application of our 3D tissue reconstruction method in addressing traditional research questions. Nonetheless, we agree that expanding the labeling strategy in future studies would allow for a more thorough investigation of muscle fiber diversity. Relevant citations have been properly revised and updated.

      (4) Lines 371-379: The claim that DV muscles regenerate into longitudinal fibers lacks evidence. Furthermore, previous studies have shown that TFs specifying different muscle types (DV, circular, longitudinal, and intestinal) both during regeneration and homeostasis are completely different (Scimone et al., Nature 2017 and Scimone et al., Current Biology 2018). Single-cell RNAseq data further establishes the existence of divergent muscle progenitors giving rise to different muscle fibers. These observations directly contradict the authors' claim, which is only based on images of fixed samples at a coarse time resolution.

      Thank you for your valuable feedback. Our intent was not to suggest that DV muscles regenerate into longitudinal fibers. Our observations focused on the wound site, where DV muscle fibers appear to reconnect, and longitudinal fibers, along with other muscle types, gradually regenerate to restore the structure of the injured area. We have revised the our statement as:“During the regeneration process, DV muscle fibers reconnect at the wound site, with longitudinal fibers and other muscle types gradually restoring the structure at the anterior tip and later integrating with circular and diagonal fibers through small DV fiber branches (Figure S5O1-O3).”

      (5) Line 423: The manuscript lacks evidence to claim glia guide muscle fiber branching.

      We agree with your concerns that our statement may be overestimated. We have removed this statement from the revised version. Instead, we focused on describing our observations of the connections between glial cells and muscle fibers. We have revised the section as follows: “Considering the interaction between glial and muscle cells, the localization of estrella<sup>+</sup> glia and muscle fibers is further investigated. By dual-staining of anti-Phospho (Ser/Thr) and 6G10 in inr-1 RNAi and β-catenin-1 RNAi planarians, we found that the morphologies of neurons are normal, and they have close contact with muscle fibers (Figure 6D, E). However, by dual staining of estrella and 6G10, we found that the structure of glial cells is star-shaped in egfp RNAi planarian, however, glial cells in inr-1 RNAi and β-catenin-1 RNAi planarians have shorter cytoplasmic projections, and their sizes are smaller, lacking the major projection onto the muscles (Figure 6D, E, Figure S6E-K). Especially, in the posterior head of β-catenin-1 RNAi planarians, the glial cell has few axons and can hardly connect with muscle fibers (Figure 6E). These results indicated that proper neuronal guidance and muscle fiber distribution could potentially contribute to facilitating accurate glial-to-muscle projections.

      (6) Lines 432/478: The conclusion about neuronal and muscle guidance on glial projections is similarly speculative, lacking functional evidence. It is possible that the morphological defects of estrella+ cells after bcat1 RNAi are caused by Wnt signaling directly acting on estrella+ cells independent of muscles or neurons.

      We understand that this approach is insufficient and we have revised the this section as follows: “Further investigation is required to distinguish the cell-autonomous and non-autonomous effects of inr-1 RNAi and β-catenin-1 RNAi on muscle and glial cells.”

      (7) Finally, several technical issues make the results difficult to interpret. For example, in line 125, cell boundaries appear to be determined using nucleus images; in line 136, the current resolution seems insufficient to reliably trace neural connections, at least based on the images presented.

      We use two setups for imaging cells and neuron projections. For cellular resolution imaging, we utilized a 1× air objective with a numerical aperture (NA) of 0.25 and a working distance of 60 mm (OLYMPUS MV PLAPO). The voxel size used was 0.8×0.8×2.5 μm<sup>3</sup>. This configuration resulted in a resolution of 2×2×5 μm<sup>3</sup> and a spatial resolution of 0.5×0.5×1.25 μm<sup>3</sup> with 4.5× isotropic expansion. Alternatively, for sub-cellular imaging, we employed a 10×0.6 SV MP water immersion objective with 0.8 NA and a working distance of 8 mm (OLYMPUS). The voxel size used in this configuration was 0.26×0.26×0.8 μm<sup>3</sup>. As a result of this configuration, we achieved a resolution of 0.5×0.5×1.6 μm<sup>3</sup> and a spatial resolution of 0.12×0.12×0.4 μm<sup>3</sup> with a 4.5× isotropic expansion. The higher resolution achieved with sub-cellular imaging allows us to observe finer structures and trace neural connections.

      Regarding your question about cell boundaries, we have revised the manuscript to specify that the boundaries we identified are those of each nucleus.

      Reviewer #3 (Public review):

      Weaknesses:

      (1) The work would have been strengthened by a more careful consideration of previous literature. Many papers directly relevant to this work were not cited. Such omissions do the authors a disservice because in some cases, they fail to consider relevant information that impacts the choice of reagents they have used or the conclusions they are drawing.

      For example, when describing the antibody they use to label muscles (monoclonal 6G10), they do not cite the paper that generated this reagent (Ross et al PMCID: PMC4307677), and instead, one of the papers they do cite (Cebria 2016) that does not mention this antibody. Ross et al reported that 6G10 does not label all body wall muscles equivalently, but rather "predominantly labels circular and diagonal fibers" (which is apparent in Figure S5A-D of the manuscript being reviewed here). For this reason, the authors of the paper showing different body wall muscle populations play different roles in body patterning (Scimone et al 2017, PMCID: PMC6263039, also not cited in this paper) used this monoclonal in combination with a polyclonal antibody to label all body wall muscle types. Because their "pan-muscle" reagent does not label all muscle types equivalently, it calls into question their quantification of the different body wall muscle populations throughout the manuscript. It does not help matters that their initial description of the body wall muscle types fails to mention the layer of thin (inner) longitudinal muscles between the circular and diagonal muscles (Cebria 2016 and citations therein).

      Ipsilateral and contralateral projections of the visual axons were beautifully shown by dye-tracing experiments (Okamoto et al 2005, PMID: 15930826). This paper should be cited when the authors report that they are corroborating the existence of ipsilateral and contralateral projections.

      Thank you for your feedback. We have incorporated these citations and clarifications into the revised manuscript. We acknowledge the limitations of this approach and have added a statement for this limitation in the revised manuscript “It is noted that previous studies reported that 6G10 does not label all body wall muscles equivalently with the limitation of predominantly labeling circular and diagonal fibers (Scimone et al., 2017; Ross et al., 2015). Our observation may be limited by this preference.”

      (2) The proportional decrease of neurons with growth in S. mediterranea was shown by counting different cell types in macerated planarians (Baguna and Romero, 1981; https://link.springer.com/article/10.1007/BF00026179) and earlier histological observations cited there. These results have also been validated by single-cell sequencing (Emili et al, bioRxiv 2023, https://www.biorxiv.org/content/10.1101/2023.11.01.565140v). Allometric growth of the planaria tail (the tail is proportionately longer in large vs small planaria) can explain this decrease in animal size. The authors never really discuss allometric growth in a way that would help readers unfamiliar with the system understand this.

      Thank you for your feedback. We have incorporated these citations and clarifications into the revised manuscript “These findings provide evidence to support the previous prediction and consistency between different planarian species (Baguñà et al., 1981; Emili et al.,2023). Because the tail is proportionately longer in large than in small planarians, the allometric growth of the planarians can be one possibility for this decrease along with the increase in animal size. The phenomenon may also suggest the existence of a threshold in the increase of planarian neuron numbers, which may ultimately contribute to some physiological changes, such as planarian fission.”

      (3) In some cases, the authors draw stronger conclusions than their results warrant. The authors claim that they are showing glial-muscle interactions, however, they do not provide any images of triple-stained samples labeling muscle, neurons, and glia, so it is impossible for the reader to judge whether the glial cells are interacting directly with body wall muscles or instead with the well-described submuscular nerve plexus. Their conclusion that neurons are unaffected by beta-cat or inr-1 RNAi based on anti-phospho-Ser/Thr staining (Fig. 6E) is unconvincing. They claim that during regeneration "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373). They provide no evidence for such switching of muscle cell types, so it is unclear why they say this.

      We acknowledge that some of our conclusions were overclaimed given the current data, and we appreciate the opportunity to clarify and refine these claims in the revised manuscript. Due the technique reason, we have not achieved the triple-staining to address this concern. We hope to make a progress in our future studies. Regarding the statement that "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373), as addressed in our previous response, this statement was unclear. Our intent was not to imply that DV muscles switch into longitudinal fibers. Instead, we observed that muscle fibers reconnect at the wound site, with longitudinal fibers and other muscle types gradually restoring the structure. We have revised this section: “During the regeneration process, DV muscle fibers reconnect at the wound site, with longitudinal fibers and other muscle types gradually restoring the structure at the anterior tip and later integrating with circular and diagonal fibers through small DV fiber branches (Figure S5O1-O3).”

      (4) The authors show how their automated workflow compares to manual counts using PI-stained specimens (Figure S1T). I may have missed it, but I do not recall seeing a similar ground truth comparison for their muscle fiber counting workflow. I mention this because the segmented image of the posterior muscles in Figure 4I seems to be missing the vast majority of circular fibers visible to the naked eye in the original image.

      Thank you for raising this important point. We have included a ground truth comparison of our automated muscle fiber segmentation with the original image in the revised Figure S6. The original Figure S6 has been changed as Figure S7. Regarding the observation of missing circular fibers in Figure 4I, we agree that the segmentation appears to have missed a significant number of circular fibers in this particular image. This may have been due to limitations in the current parameters of the segmentation algorithm, especially in distinguishing fibers in regions of varying intensity or overlap.

      (5) It is unclear why the abstract says, "We found the rate of neuron cell proliferation tends to lag..." (line 25). The authors did not measure proliferation in this work and neurons do not proliferate in planaria.

      Thank you for pointing out this mistake. What we intended to convey was the increase in neuron number during homeostasis. We have revised the abstract “We found that the increase in neuron cell number tends to lag behind the rapid expansion of somatic cells during the later phase of homeostasis.”

      (6) It is unclear what readers are to make of the measurements of brain lobe angles. Why is this a useful measurement and what does it tell us?

      The measurement of brain lobe angles is intended to provide a quantitative assessment of the growth and morphological changes of the planarian brain during regeneration. Additionally, the relevance of brain lobe angles has been explored in previous studies, such as Arnold et al., Nature, 2016, further supporting its use as a meaningful parameter.

      (7) The authors repeatedly say that this work lets them investigate planarians at the single-cell level, but they don't really make the case that they are seeing things that haven't already been described at the single-cell level using standard confocal microscopy.

      Thank you for your comment. We agree that single-cell level imaging has been previously achieved in planarians using conventional confocal microscopy. However, our goal was to extend the application of expansion microscopy by combining C-MAP with tiling light sheet microscopy (TLSM), which allows for faster and high-resolution 3D imaging of whole-mount planarians. We have added in the discussion section: “This combination offers several key advantages over standard techniques. For example, it enables high-throughput imaging across entire organisms with a level of detail and speed that is not easily achieved using confocal methods. This approach allows us to investigate the planarian nervous system at multiple developmental and regenerative stages in a more comprehensive manner, capturing large-scale structures while preserving fine cellular details. The ability to rapidly image whole planarians in 3D with this resolution provides a more efficient workflow for studying complex biological processes.”

    1. Author response:

      In view of the suggestions of the referees, we wish to underline that a user can interact with celldetective at two levels: a non-coder can analyse data and train models without coding, but is necessarily offered pre-determined choices and flexibility. An advanced user however has practically limitless flexibility to extend the fully-open source celldetective, aided by its modularity and detailed manual.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Torro et al. presented CellDetective, an open-source software designed for a user-friendly execution of single-cell segmentation, tracking, and analysis of time-lapse microscopy data. The authors demonstrated the applications of the software by measuring NK cell spreading events acquired with reflection interference contrast microscopy (RICM), as well as detecting target cell death events and their interaction with neighboring NK cells in a multichannel widefield microscopy dataset.

      Strengths:

      The segmentation (StarDist, Cellpose) and tracking (bTrack) modules implemented were based on existing and published software packages. The authors added the event detection, classification, and analysis modules to enable an end-to-end time-lapse microscopy data processing and analysis pipeline, complete with a graphical user interface (GUI). This minimizes the coding experience required from the user. The documentation that accompanies CellDetective is also adequate.

      Weaknesses:

      Given that the software was designed to improve user experience, such an approach also limits its scope and functionality and is currently capable of handling very specific types of experiments. Additionally, this reviewer has also encountered many technical difficulties (see documented bugs/crashes below) that have prevented an extensive exploration of all the functionality of CellDetective.

      We apologize for the technical difficulties and bugs; the ones mentioned have been already corrected. New users have also tested the installation and reported it to be bug-free.

      We fully agree on the compromise that has to be found between user experience and versatility. We have already tested celldetective in other biological contexts, such as microbiology, but made a choice to showcase it in the article for immunological applications. We invite the reader to consult the software documentation and online examples to learn about more options.

      Specifics:

      (1) The software can only handle 2D 'widefield' time-lapse imaging datasets. It should be noted that many studies that examine cell-cell interactions in vitro also used confocal microscopy and acquired the time-lapse images in 3D z-stacks to enable the reconstruction of entire cell volumes from multiple optical sections along the z-axis.

      Given that almost all of the implemented segmentation (StarDist, Cellpose) and tracking (bTrack) packages already support the handling of 3D datasets, it is unclear why CellDetective was designed to only work with 2D datasets.

      As noted above, extending the support for 3D images would allow the scope and utility of this software to be further extended for imaging studies acquired in z-stacks. As an example, the dense clustering of effector cells in Figure 4 had prevented accurate segmentation due to the 2D nature of the experimental dataset. More importantly, support for a 3D dataset could also allow for the tracking of fluorescent protein-based sub-cellular as well as membrane protein localization during cell-cell interactions.

      Furthermore, it also widens the potential applicability for analyzing datasets from 3D organoid imaging and perhaps even intravital two-photon microscopy.

      We thank the reviewer for this suggestion. Indeed, extension to 3-dimensions is a natural development, since we have chosen segmentation and tracking methods which are compatible with 3D. However, two important strengths of celldetective are: harnessing statistical power of cell populations together with multiplexing biological conditions, and dynamic analysis of fast events.

      For both, 2D is advantageous. Our own focus is on analyzing cellular events with minute time resolution, relevant in immunology. By our estimate (experience and literature), 3D timelapse acquisition would reduce the time resolution, as well as throughput (in terms of events and conditions) to below acceptable level. While we don’t envisage this upgrade in the immediate future, we encourage advanced users to contribute to further develop the open-source code in this direction. As a mitigation solution, a 2.5D approach on a flat sample by combining two z planes (in order to address issues of cell superposition for example), could be readily implemented with minimal change.

      (2) The software in its current form only allows the broad demarcation of the cells examined into two populations: targets and effectors. This limits the number of cell populations that can be examined for their interactions. It might be more useful to just allow multiple user-defined populations instead of restricting the populations to target and effector cells only.

      We thank the reviewer for this suggestion. There is little architectural limitation to its implementation; this will be proposed in the future version. This updated version will allow more than two user-defined populations, labelled directly by the user, which will also facilitate the natural extension to more varied biological applications. Three-way interactions are much more complex, and, to our knowledge, not currently addressed by biologists. The interactions will for the moment be limited to 2 populations interactions, as multipartite ones involve a higher level of code modifications, not immediately envisaged.

      (3) Similarly, subsetting of each of the populations could be made more intuitive. Although it is possible to define subsets of cells using the "Custom classification" function under the "Measure" module with user-defined parameters, visualization of multiple groups remains unintuitive and it appears that only one custom classified group can be selected and visualized at any given time in the Signal Annotator under Measurement instead of allowing visualization of multiple (custom defined) groups of cells in different colors. It is also unclear how, if possible at all, to visualize a custom group of cells in the Signal Annotator under the Detect Events module.

      The simultaneous visualization of several classes poses problems in the choice of colors and symbols, and may render the tool difficult to use. The time propagation option in the classification tool allows to define event classes as opposed to groups, that are compatible with the Signal Annotator. For more complex classifications, a simple solution is to work with composite classifications, which are already supported by using logical AND/OR operators on the condition defining the class. We believe that this feature is sufficient to address this issue.

      Software issues:

      (4) When initially tested on v1.3.9, the Segment module could not be initiated (with the error message AttributeError: 'WindowsPath' object has no attribute 'endswith' when attempting to run segmentation).

      Update: this has been fixed in v1.3.9.post4 dated February 7th, 2025.

      (5) Further testing was then performed by downgrading the software to v1.3.1. While testing the ADCC demo experiment (https://celldetective.readthedocs.io/en/latest/adcc-example.html), the workflow was stuck at attempts to initiate the Detect Events step:

      AssertionError: No signal matches with the requirements of the model ['dead_nuclei_channel_mean', 'area']. Please pass the signals manually with the argument selected_signals or add measurements. Abort.

      (Update: fixed in the latest v1.3.9.post4 version dated February 7th, 2025)

      (6) Random bugs causing the software to crash. Example: switching characteristic to 'status_color' in the Signal Annotator under Measurement caused the software to crash (v1.3.9.post4):

      TypeError: ufunc 'isnan' is not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

      (7) Overall, when exploring the functionality of the software, there have been multiple instances of software crashes when clicking/switching around to show different parameters, etc.

      This reviewer understands the difficulties and time involved in bug fixing and hopes that the experience could have been much smoother and that the software behaves much more stably in order to maximize its useability.

      We apologize again for the various technical issues encountered during the review process, and thank the reviewer for mentioning that several bugs were already fixed in the last software release. The open source and software maintenance protocol enabled by github should help to resolve any further emerging issue.

      Reviewer #2 (Public review):

      Summary:

      Immune assays enable the analysis of immune responses in vitro. These assays generate time series image data across several experimental conditions. The imaging parameters such as the imaging modality and the number of channels can vary across experiments. A challenge in the field is the lack of (open source) tools to process and analyze these data. R. Torro, et. al. developed an open source end-to-end pipeline for the analysis of image data from these immune assays. The pipeline is designed with a GUI and is suited for experimental biologists with no coding experience. The authors have incorporated several existing methods and tools for individual tasks such as for segmentation and cell tracking, and incorporated them with custom methods where necessary such as for tracking cell state transitions.

      Strengths:

      (1) The tool is extremely well-documented and easy to install.

      (2) Applicable to a wide variety of imaging modalities and analysis.

      (3) There are several different options for each step, such as segmentation using traditional methods or deep learning methods, and all the analysis steps are integrated in one place with a GUI. The no-coding requirement makes this a very powerful tool for biologists and has the potential to enable a wide variety of analyses.

      Weakness:

      (1) It would be good to provide documentation on how to make the tool applicable for applications and analysis other than for immune profiling since most methods integrated here are applicable well beyond immune profiling. For example, a user might want to use the tool just for the segmentation of their IF microscopy-images.

      This is an important suggestion that we will implement as short demonstrations using data from the public domain. These will be proposed as examples in the online documentation.

      (2) They applied Celldetective to two immune assays. The authors present the results from these assays and use the results to validate their assay. However, they have not included data that demonstrates results obtained via this pipeline are comparable to results obtained with other pipelines and/or if these results are consistent with what is expected in the literature.

      In the final version of the article, we shall compare celldetective with existing literature, including our previous work, when possible. However, we emphasize that most of the presented data are original and don’t have any published equivalent in the literature. Concerning the immunotherapy assays, data presented already show expected trends (see for example Fig. 2 and Fig. 5). We reserve for future publications the systematic comparison with traditional (non microscopy-based) methods, as we consider it out-of-scope here. Additionally, there is, to our knowledge no existing open pipeline performing the full end-to-end analysis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper uses single-molecule FRET to investigate the molecular basis for the distinct activation mechanisms between 2 GPCR responding to the chemokine CXCL12 : CXCR4, that couples to G-proteins, and ACKR3, which is G-protein independent and displays a higher basal activity.

      Strengths:

      It nicely combines the state-of-the-art techniques used in the studies of the structural dynamics of GPCR. The receptors are produced from eukaryotic cells, mutated, and labeled with single molecule compatible fluorescent dyes. They are reconstituted in nanodiscs, which maintain an environment as close as possible to the cell membrane, and immobilized through the nanodisc MSP protein, to avoid perturbing the receptor's structural dynamics by the use of an antibody for example.

      The smFRET data are analysed using the HHMI technique, and the number of states to be taken into account is evaluated using a Bayesian Information Criterion, which constitutes the state-of-the-art for this task.

      The data show convincingly that the activation of the CXCR4 and ACKR3 by an agonist leads to a shift from an ensemble of high FRET states to an ensemble of lower FRET states, consistent with an increase in distance between the TM4 and TM6. The two receptors also appear to explore a different conformational space. A wider distribution of states is observed for ACKR3 as compared to CXCR4, and it shifts in the presence of agonists toward the active states, which correlates well with ACKR3's tendency to be constitutively active. This interpretation is confirmed by the use of the mutation of Y254 to leucine (the corresponding residue in CXCR4), which leads to a conformational distribution that resembles the one observed with CXCR4. It is correlated with a decrease in constitutive activity of ACKR3.

      Weaknesses:

      Although the data overall support the claims of the authors, there are however some details in the data analysis and interpretation that should be modified, clarified, or discussed in my opinion

      Concerning the amplitude of the changes in FRET efficiency: the authors do not provide any structural information on the amplitude of the FRET changes that are expected. To me, it looks like a FRET change from ~0.9 to ~0.1 is very important, for a distance change that is expected to be only a few angstroms concerning the movement of the TM6. Can the authors give an explanation for that? How does this FRET change relate to those observed with other GPCRs modified at the same or equivalent positions on TM4 and TM6?

      The large FRET change in our system was initially unexpected. However, the reviewer is mistaken that the expected distance change is only a few angstroms. Crystal structures of the homologous beta2 adrenergic receptor (β<sub>2</sub>AR) in inactive and active conformations reveal that the cytoplasmic end of TM6 moves outwards by 16 angstroms during activation (Rasmussen et al., 2011, ref 47).  Consistent with this, smFRET studies of β<sub>2</sub>AR labeled in TM4 and TM6 (as here) showed that the donor-acceptor (D-A) distance was 14 angstroms longer in the active conformation (Gregorio et al., ref 38).  Surprisingly, the apparent distance change in our system (calculated for our FRET probes, A555/Cy5, using FPbase.com) is almost 30 angstroms. A possible explanation is that the fluorophore attached to TM6 interacts with lipids within the nanodisc when TM6 moves outwards, which could stretch the fluorophore linker and thereby increase the D-A distance (lipids were absent in the β<sub>2</sub>AR study). Such an interaction could also constrain the fluorophore in an unfavorable orientation for energy transfer, also leading to lower than expected FRET efficiencies and inflated distance calculations. Regardless, it is important to emphasize that none of the interpretations or conclusions of our study are based on computed D-A distances. Rather, we resolved different receptor conformations and quantified their relative populations based on the measured FRET efficiency distributions.

      Finally, we note that a recent smFRET study of the glucagon receptor (labeled in TM4 and TM6, as here) also revealed a large difference in apparent FRET efficiencies between inactive (E<sub>app</sub> = 0.83) and active (E<sub>app</sub> = 0.32) conformations (Kumar et al., ref. 39). Thus, the large change in FRET efficiency observed in our study is not unprecedented.

      Concerning the intermediate states: the authors observe several intermediate states.

      (1) First I am surprised, looking at the time traces, by the dwell times of the transitions between the states, which often last several seconds. Is such a long transition time compatible with what is known about the kinetic activation of these receptors?

      We too were surprised by the apparent kinetics of the receptors in our system. However, it was previously noted that purified systems, including nanodiscs, lead to slower activation times for GPCRs compared to cellular membrane systems (Lohse et al, Curr. Opin. Cell Biology, 27, 8792, 2014). Indeed, slow transitions among different FRET states (dwell times in the seconds range) were also observed in recent smFRET studies of the mu opioid receptor (Zhao et al., 2024, ref. 41) and the glucagon receptor (Kumar et al., 2023, ref. 39). These studies are consistent with the observed time scale of the FRET transitions reported here.

      (2) Second is it possible that these “intermediate” states correspond to differences in FRET efficiencies, that arise from different photophysical states of the dyes? Alexa555 and Cy5 are Cyanines, that are known to be very sensitive to their local environment. This could lead to different quantum yields and therefore different FRET efficiencies for a similar distance. In addition, the authors use statistical labeling of two cysteines, and have therefore in their experiment a mixture of receptors where the donor and acceptor are switched, and can therefore experience different environments. The authors do not speculate structurally on what these intermediate states could be, which is appreciated, but I think they should nevertheless discuss the potential issue of fluorophore photophysics effects.

      The reviewer is correct that the intermediate FRET states could, in principle, arise from a conformational change of the receptor that alters the local environment of the donor and/or acceptor fluorophores, rather than a change in donor-acceptor distance. This caveat is now included in the discussion on Pg. 10:

      “In principle, the intermediates in CXCR4 and ACKR3 could represent partial movements of TM6 from the inactive to active conformation or more subtle conformational changes altering the photophysical characteristics of the probes without drastically altering the donor-acceptor distance. Either possibility leads to detectable changes in apparent FRET efficiency and reflect discrete conformational steps on the activation pathway; however, it is not possible to resolve specific structural changes from the data.”

      Regarding the second possibility, it is true that our labeling methodology leads to a statistical mixture of labeled species (D on TM6 and A on TM4, D on TM4 and A on TM6). If the photophysical properties of the fluorophores were markedly different for the two labeling orientations, this would produce two different FRET efficiencies for a given receptor conformation. Assuming two receptor conformations, this scenario would produce four distinct FRET states: E<sub>1</sub> (inactive receptor, labeling configuration 1), E<sub>2</sub> (active receptor, labeling configuration 1), E<sub>3</sub> (inactive receptor, labeling configuration 2) and E<sub>4</sub> (active receptor, labeling configuration 2), with two cross peaks in the TDP plots, corresponding to E<sub>1</sub> ↔ E<sub>2</sub> and E<sub>3</sub> ↔ E<sub>4</sub> transitions. Notably, E<sub>2</sub> ↔ E<sub>3</sub> cross peaks would not be present, since states E<sub>2</sub> and E<sub>3</sub> exist on separate molecules. Instead, we see all states inter-connected sequentially, R ↔ R’ ↔ R* in CXCR4 and R ↔ R’ ↔ R*’ ↔ R* in ACKR3 (Fig. 2), suggesting that the resolved FRET states represent interconnected conformational states.

      We added the following text to the Results section on Pg. 6:

      “Two-dimensional transition density probability (TDP) plots revealed that the three FRET states were connected in a sequential fashion (Figs. 2A & B), indicating that the transitions occurred within the same molecules. Notably, these observations exclude the possibility that the midFRET state arises from different local fluorophore environments (hence FRET efficiencies) for the two possible labeling orientations of the introduced cysteines: assuming two receptor conformations, this model would produce four distinct FRET states, but only two cross peaks in the TDP plot.”

      (3) It would also have been nice to discuss whether these types of intermediate states have been observed in other studies by smFRET on GPCR labeled at similar positions.

      Intermediate states have also been reported in previous smFRET studies of other GPCRs. For example, in the glucagon receptor (also labeled in TM4 and TM6), a third FRET state (E<sub>app</sub> =  0.63) was resolved between the inactive (E<sub>app</sub>  = 0.85) and active (E<sub>app</sub>  = 0.32) states (Kumar et al., Ref. 39).  Discrete intermediate receptor conformations were also observed in the A<sub>2A</sub>R labeled in TM4 and TM6 (Fernandes et al., Ref 40). These examples are now cited in the Discussion.

      On line 239: the authors talk about the R↔R' transitions that are more probable. In fact it is more striking that the R'↔R* transition appears in the plot. This transition is a signature of the behavior observed in the presence of an agonist, although IT1t is supposed to be an inverse agonist. This observation is consistent with the unexpected (for an inverse agonist) shift in the FRET histogram distribution. In fact, it appears that all CXCR4 antagonists or inverse agonists have a similar (although smaller) effect than the agonist. Is this related to the fact that these (antagonist or inverse agonist) ligands lead to a conformation that is similar to the agonists, but cannot interact with the G-protein ?? Maybe a very interesting experiment would be here to repeat these measurements in the presence of purified G-protein. G-protein has been shown to lead to a shift of the conformational space explored by GPCR toward the active state (using smFRET on class A and class C GPCR). It would be interesting to explore its role on CXCR4 in the presence of these various ligands. Although I am aware that this experiment might go beyond the scope of this study, I think this point should be discussed nevertheless.

      We thank the reviewer for this observation and the possible explanation offered.  In response, we have added the following text to the Results section on Pg. 7:

      “The small-molecule ligand IT1t is reported to act as an inverse agonist of CXCR4 (54-56). However, the conformational distribution of CXCR4 showed little change to the overall apparent

      FRET profile, although R’ ↔ R* transitions appeared in the TDP plot (Figs. 3A & B, Fig. S8). This suggests that the small molecule does not suppress CXCR4 basal signaling by changing the conformational equilibrium. Instead IT1t appears to increase transition probabilities which may impair G protein coupling by CXCR4.”

      We have also added the following text to the Results on Pg. 8:

      “Despite the ability of CXCL12<sub>P2G</sub> and CXCL12<sub>LRHQ</sub> to stabilize the active R* conformation of CXCR4, both variants are known to act as antagonists (20). This suggests that the CXCL12 mutants inhibit CXCR4 coupling to G proteins not by suppressing the active receptor population but rather by increasing the dynamics of the receptor state transitions. Our results suggest that the helical movements considered classic signatures of the active state may not be sufficient for CXCR4 to engage productively with G proteins.”

      In addition, we have added the following text to the Discussion on Pg. 11:

      “The chemokine variants CXCL12<sub>P2G</sub> and CXCL12<sub>LRHQ</sub> are reported to act as antagonists of CXCR4 (19, 20), and the small molecule IT1t acts as an inverse agonist (54-56). Surprisingly, none of these ligands inhibit formation of the active R* conformation of CXCR4. In fact, the chemokine variants both stabilize and increase this state to some degree, although less effectively than CXCL12<sub>WT</sub>. Thus, the antagonism and inverse agonism of these ligands does not appear to be linked exclusively to receptor conformation, suggesting that the ligands inhibit coupling of G proteins to CXCR4 or disrupt the ligand-receptor-G protein interaction network required for signaling (Fig. S10) (21, 23).  Interestingly, these ligands also increase the probabilities of state-to-state transitions (Figs. 3B & 4B), suggesting that enhanced conformational exchange prevents the receptor from productively engaging G proteins. Similarly, ACKR3 is naturally dynamic and lacks G protein coupling, suggesting a common mechanism of G protein antagonism.”

      Finally, we also agree that experiments with G proteins could be informative. In fact, we initiated such experiments during the course of this study.  However, it soon became apparent that significant optimization would be required to identify fluorophore labeling positions that report receptor conformation without inhibiting G protein coupling. Accordingly, we decided that G protein experiments would be the subject of future studies.

      However, we added the following text to the Discussion on Pg. 12:

      “Future smFRET studies performed in the presence of G proteins should be informative in this regard”.

      The authors also mentioned in Figure 6 that the energetic landscape of the receptors is relatively flat ... I do not really agree with this statement. For me, a flat conformational landscape would be one where the receptors are able to switch very rapidly between the states (typically in the submillisecond timescale, which is the timescale of protein domain dynamics). Here, the authors observed that the transition between states is in the second timescale, which for me implies that the transition barrier between the states is relatively high to preclude the fast transitions.

      We thank the reviewer for the comment. We have modified the description of the energy landscapes of ACKR3 and CXCR4 in the discussion on Pg. 10 as follows:

      “These observations imply that ACKR3 has a relatively flat energy landscape, with similar energy minima for the different conformations, whereas the energy landscape of CXCR4 is more rugged (Fig. 6). For both receptors, the energy barriers between states are sufficiently high that transitions occur relatively slowly with seconds long dwell times (Figs. 1C and S2).”

      Reviewer #2 (Public Review):

      Summary:

      his manuscript uses single-molecule fluorescence resonance energy transfer (smFRET) to identify differences in the molecular mechanisms of CXCR4 and ACKR3, two 7transmembrane receptors that both respond to the chemokine CXCL12 but otherwise have very different signaling profiles. CXCR4 is highly selective for CXCL12 and activates heterotrimeric G proteins. In contrast, ACKR3 is quite promiscuous and does not couple to G proteins, but like most G protein-coupled receptors (GPCRs), it is phosphorylated by GPCR kinases and recruits arrestins. By monitoring FRET between two positions on the intracellular face of the receptor (which highlights the movement of transmembrane helix 6 [TM6], a key hallmark of GPCR activation), the authors show that CXCR4 remains mostly in an inactive-like state until CXCL12 binds and stabilizes a single active-like state. ACKR3 rapidly exchanges among four different conformations even in the absence of ligands, and agonists stabilize multiple activated states.

      Strengths:

      The core method employed in this paper, smFRET, can reveal dynamic aspects of these receptors (the breadth of conformations explored and the rate of exchange among them) that are not evident from static structures or many other biophysical methods. smFRET has not been broadly employed in studies of GPCRs. Therefore, this manuscript makes important conceptual advances in our understanding of how related GPCRs can vary in their conformational dynamics.

      Weaknesses:

      (1) The cysteine mutations in ACKR3 required to site-specifically install fluorophores substantially increase its basal and ligand-induced activity. If, as the authors posit, basal activity correlates with conformational heterogeneity, the smFRET data could greatly overestimate the conformational heterogeneity of ACKR3.

      The change in basal ACKR3 activity with the Cys introductions are modest in comparison and insignificantly different as determined by extra-sum-of-squares F test (P=0.14).

      (2) The probes used cannot reveal conformational changes in other positions besides TM6. GPCRs are known to exhibit loose allosteric coupling, so the conformational distribution observed at TM6 may not fully reflect the global conformational distribution of receptors. This could mask important differences that determine the ability of intracellular transducers to couple to specific receptor conformations.

      We agree that the overall conformational landscape of the receptors has not been investigated and we have added this caveat to the discussion on Pg. 12.

      “An important caveat is that our study does not report on the dynamics of the other TM helices and H8, some of which are known to participate in arrestin interactions.”

      (3) While it is clear that CXCR4 and ACKR3 have very different conformational dynamics, the data do not definitively show that this is the main or only mechanism that contributes to their functional differences. There is little discussion of alternative potential mechanisms.

      The main functional difference between CXCR4 and ACRK3 is their effector coupling: CXCR4 couples to G proteins, whereas ACKR3 only couples to arrestins (following phosphorylation of the C-terminal tail by GRKs). As currently noted in the discussion, ACKR3 has many features that may contribute to its lack of G protein coupling, including lack of a well-ordered intracellular pocket due to conformational dynamics, lack of an N-term-ECL3 disulfide, different chemokine binding mode, and the presence of Y257. Steric interference due to different ICL loop structures may also interfere with G protein activation. No one thing has proven to confer ACKR3 with G protein activity including swapping all of the ICLs to those of canonical chemokine receptor, suggesting it is a combination of these different factors. The following has been added to the discussion on Pg. 13 to clearly note that any one feature is unlikely to drive the atypical behavior of ACKR3:

      “The atypical activation of ACKR3 does not appear to be dependent on any singular receptor feature and is likely a combination of several factors.”

      (4) The extent to which conformational heterogeneity is a characteristic feature of ACKRs that contributes to their promiscuity and arrestin bias is unclear. The key residue the authors find promotes ACKR3 conformational heterogeneity is not conserved in most other ACKRs, but alternative mechanisms could generate similar heterogeneity.

      Despite the commonalities in the roles of the ACKRs, they all appear to have evolved independently. Thus, we do not believe that all features observed and described for one ACKR will explain the behavior of another. We have carefully avoided expanding our observations to other ACKRs to avoid suggesting common mechanisms.

      (5) There are no data to confirm that the two receptors retain the same functional profiles observed in cell-based systems following in vitro manipulations (purification, labeling, nanodisc reconstitution).

      We agree this is an important point. All labeled receptors responded to agonist stimulation as expected. As only properly folded receptors are able to make the extensive interactions with ligands necessary for conformational changes (for instance, CXCL12 interacts with all TMs and ECLs), this suggests that the proteins are folded correctly and functional following all manipulations.

      Reviewer #3 (Public Review):

      Summary:

      This is a well-designed and rigorous comparative study of the conformational dynamics of two chemokine receptors, the canonical CXCR4 and the atypical ACKR3, using single-molecule fluorescence spectroscopy. These receptors play a role in cell migration and may be relevant for developing drugs targeting tumor growth in cancers. The authors use single-molecule FRET to obtain distributions of a specific intermolecular distance that changes upon activation of the receptor and track differences between the two receptors in the apo state, and in response to ligands and mutations. The picture emerging is that more dynamic conformations promote more basal activity and more promiscuous coupling of the receptor to effectors.

      Strengths:

      The study is well designed to test the main hypothesis, the sample preparation and the experiments conducted are sound and the data analysis is rigorous. The technique, smFRET, allows for the detection of several substates, even those that are rarely sampled, and it can provide a "connectivity map" by looking at the transition probabilities between states. The receptors are reconstituted in nanodiscs to create a native-like environment. The examples of raw donor/acceptor intensity traces and FRET traces look convincing and the data analysis is reliable to extract the sub-states of the ensemble. The role of specific residues in creating a more flat conformational landscape in ACKR3 (e.g., Y257 and the C34-C287 bridge) is well documented in the paper.

      Weaknesses:

      The kinetics side of the analysis is mentioned, but not described and discussed. I am not sure why since the data contains that information. For instance, it is not clear if greater conformational flexibility is accompanied by faster transitions between states or not.

      The reviewer is correct that kinetic information is available, in principle, from smFRET experiments. However, a detailed kinetic analysis will require a much larger data set than we currently possess, to adequately sample all possible transitions and the dwell times of each FRET state. We intend to perform such an analysis in the future as more data becomes available. The purpose of this initial study was to explore the conformational landscapes of CXCR4 and ACKR3 and to reveal differences between them. To this end, we have documented major differences in conformational preferences and response to ligands of the two receptors that are likely relevant to their different biological behavior. Future kinetic information will add further detail, but is not expected to alter the conclusions drawn here.

      The method to choose the number of states seems reasonable, but the "similarity" of states argument (Figures S4 and S6) is not that clear.

      We thank the reviewer for noting a need for further clarification. We qualitatively compared the positions of the various FRET peaks across treatments to gain insight into the consistency of the conformations and avoid splitting real states by overfitting the data. For instance, fitting the ACKR3 treatments with three states leads to three distinct FRET populations for the R’ intermediate. Adding a fourth state results in two intermediates that are fairly well overlapping. In contrast, the two-intermediate model for CXCR4 appears to split the R* state of the CXCL12 treated sample and causes a general shift in both intermediate states to lower FRET values when CXCL12 is present. As we assume that the conformations are consistent throughout the treatments, we conclude that this represents an overfitting artifact and not a novel CXCL12CXCR4 R*’ state. Additional sentences have been added to the supplemental figure legend to better describe the comparative analysis.

      “(Top) With the 3-state model, the R’ states for apo-CXCR4 and for CXCL12- and IT1t-bound receptor overlapped well with similar apparent FRET values across all of the tested conditions. In the case of the four-state model, the R*’ (Middle) and R’ (Bottom) states were substantially different across the ligand treatments. In particular, the R*’ state with CXCL12 treatment appears to arise from a splitting of the R* conformation, indicating that the model was overfitting the data.”

      Also, the "dynamics" explanation offered for ACKR3's failure to couple and activate G proteins is not very convincing. In other studies, it was shown that activation of GPCRs by agonists leads to an increase in local dynamics around the TM6 labelling site, but that did not prevent G protein coupling and activation.

      We agree with the reviewer that any single explanation for ACKR3 bias, including the dynamics argument presented here, is insufficient to fully characterize the ACKR3 responses. As noted by the reviewer, the TM6 movement and dynamics is generally correlated with G protein coupling, whereas other dynamics studies (Wingler et al. Cell 2019) have noted that arrestinbiased ligands do not lead to the same degree of TM6 movement. We have added the following statement to the discussion on Pg. 13:

      “The atypical activation of ACKR3 does not appear to be dependent on any singular receptor feature and is likely a combination of several factors.” 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      I would like to raise a technical point about the calculation and reporting of the FRET efficiency. The authors report the FRET efficiency as E=IA/(IA+ID). There is now a strong recommendation from the FRET community (https://doi.org/10.1038/s41592-018-0085-0) to use the term “FRET efficiency” only when a proper correction procedure of all correction factors has been applied, which is not the case here (gamma factor has not been calculated). The authors should therefore use the term “Apparent FRET Efficiency” and  E<sub>app</sub> in all the manuscripts.

      Also, it would be nice to indicate directly on the figures whether a ligand that is used is an agonist, antagonist, inverse agonist, etc...

      We thank the reviewer for suggesting this clarification in terminology. We now refer to apparent FRET efficiency (or E<sub>app</sub>) throughout the manuscript and in the figures. In addition, we have added ligand descriptions to the relevant figures.

      Reviewer #2 (Recommendations For The Authors):

      (1) M159(4.40)C/Q245(6.28)C ACKR3 appears to have higher constitutive activity than ACKR3 Wt (Fig. S1). While the vehicle point itself is likely not significant due to the error in the Wt, the overall trend is clear and arguably even stronger than the effect of Y257(6.40)L (Fig. S9). While this is an inherent limitation of the method used, it should be clearly acknowledged; the comment in lines 162-164 seems to skirt the issue by only saying that arrestin recruitment is retained. It would be helpful and more rigorous to report the curve fit parameters (basal, E<sub>max</sub>, EC50) for the arrestin recruitment experiments and the associated errors/significance (see https://www.graphpad.com/guides/prism/latest/statistics/stat_qa_multiple_comparisons_ after_.htm for a discussion).

      The Emin, E<sub>max</sub>, and EC50 for M159<sup>4</sup>.<sup>40</sup>C/Q245<sup>6</sup>.<sup>28</sup>C ACKR3 were compared against the values for WT ACKR3 from Fig. S1 and only the E<sub>max</sub> was determined to be significantly different by the extra sum of squares F test. A note has been added to the text to reflect these results on Pg. 5.

      “Only the E<sub>max</sub> for arrestin recruitment to CXCL12-stimulated ACKR3 was significantly altered by the mutations, while all other pharmacological parameters were the same as for WT receptors.”

      (2) The methods do not specify the reactive group of the dyes used for labeling (i.e., AlexaFluor 555-maleimide and Cy5-maleimide?).

      We regret the omission and have added the necessary details to the materials and methods.

      (3) Were any of the native Cys residues removed from ACKR3 and CXCR4 in the constructs used for smFRET? ACKR3 appears to have two additional Cys residues in the N-terminus besides the one involved in the second disulfide bridge, and these would presumably be solvent-exposed. If so, please specify in the Methods and clarify whether the constructs tested in functional assays included these. (Also, please specify if the human receptors were used.)

      No additional cysteine residues were mutated in either receptor. All exposed cysteines are predicted to form disulfides. The residues in the N-terminus that the reviewer alludes to, C21 and C26, form a disulfide (Gustavsson et al. Nature Communications 2017) and are thus protected from our probes. Consistent with these expectations, neither WT CXCR4 nor ACKR3 exhibited significant fluorophore labeling (now mentioned in the text on Pg. 5). The species of origin has been added to the material and methods.

      (4) There are a few instances where the data seem to slightly diverge from the proposed models that may be helpful to comment on explicitly in the text:

      - Figure 4E (ACKR3/CXCL12(P2G)): As noted in the legend, despite stabilizing R*/R*', CXCL12(P2G) reduces transitions between these states compared to Apo. This is more similar to the effects of VUF16840 (Figure 3D) than the other ACKR3 agonists. The authors note the difference between CXCL12(LHRQ) and CXCL12(P2G) (but not vs Apo) in this regard. There might be some other information here regarding the relative importance of the conformational equilibrium vs transition rates for receptor activity.

      Although the TDPs for CXCL12<sub>P2G</sub> and VUF16840 are similar, as noted by the reviewer, the overall FRET envelopes are drastically different.

      The differences in transition probabilities for R ↔ R’ and R*’ « R* transitions observed in the presence of CXCL12<sub>P2G</sub> or CXCL12<sub>LRHQ</sub> relative to the apo receptor are now explicitly noted in the Results.

      - The conformational distributions of ACKR3 apo and ACKR3 Y257L CXCL12 are very similar (Figure 5A,D). However, there is a substantial difference in the basal activity of WT vs CXCL12stimulated Y257L (Figure S9).

      The mutation Y257L appears to promote the highest and lowest FRET states at the expense of the intermediates. Although the distribution appears similar between Apo-WT and CXCL12Y257L, the depopulation of the R’ state may lead to the observed activation in cells.

      (5) There are inconsistent statements regarding the compatibility of G protein binding to the "active-like" ACKR3 conformation observed in the authors' previous structures (Yen et al, Sci Adv 2022). In the introduction, the authors seem to be making the case that steric clashes cannot account for its lack of coupling; in the discussion, they seem to consider it a possibility.

      The introduction to previous research on the molecular mechanisms governing the lack of ACKR3-G protein coupling was not intended to be all encompassing, but rather to highlight previous efforts to elucidate this process and justify our study of the role  of dynamics. Due to the positions of the probes, we can only comment on the impact on TM6 movements and not other conformational changes. The steric clash reported in Yen et al. was in ICL2 and not directly tested here, so our observations do not preclude changes occurring in this region. We also do not claim that the active-like state resolved in our previous structures matches any specific state isolated here by smFRET.

      (6) Line 83-85: "Having excluded other mechanisms we therefore surmised that the inability of ACKR3 to activate G proteins may be due to differences in receptor dynamics."

      Line 400-402: "It is possible that the active receptor conformation clashes sterically with the G protein as suggested by docking of G proteins to structures of ACKR3."

      As mentioned above, we suspect the mechanisms governing the inability of  ACKR3 to couple to G proteins may be more complex than one particular feature but instead due to a combination of several factors. Accordingly, we have not completely eliminated a contribution of steric hindrance as we described in Yen et al. Sci Adv 2022 and instead include it as a possibility. Following the line highlighted here, we list several alternatives: 

      “Alternatively, the receptor dynamics and conformational transitions revealed here may prevent formation of productive contacts between ACKR3 and G protein that are required for coupling, even though G proteins appear to constitutively associate with the receptor.”

      And, at the end of the paragraph, we have added the following sentence: 

      “The atypical activation of ACKR3 does not appear to be dependent on any singular receptor feature and is likely a combination of several factors.”

      (7) If the authors believe that the various ligands/mutations are only altering the distribution/dynamics of the same 3/4 conformations of CXCR4/ACKR3, respectively, is there a reason each FRET efficiency histogram is fit independently instead of constraining the individual components to Gaussian components with the same centroids, and/or globally fitting all datasets for the same receptor?

      We performed global analysis across all data sets for each sample and condition. Since the peak positions of the various FRET states recovered in this way were consistent across treatments (Fig. S4,S6), we did not feel it was necessary to perform a further global analysis across all samples for a given receptor.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript is well-written, the arguments are easy to follow and the figures are helpful and clear. Here are a few questions/suggestions that the authors might want to address before the paper will be published:

      (1) Include a table with kinetic rates between states in SI and have a brief discussion in the main text to support the trends observed in transition probabilities.

      As noted above, determining rate constants for each of the state-to-state transitions will require a much larger set of experimental smFRET data than is currently available and will be the subject of future studies.

      (2) The argument of state similarity (Figure S4 and S6)... why are the profiles not Gaussian, like in the fits on Figures S3 and S5, repectively? I would also suggest that once the number of states is chosen to do a global fit, where the FRET values of a certain sub-state across different conditions for one receptor are shared.

      The state distributions presented in Figs. S4 and S6 (as well as throughout the rest of the paper) are derived from HMM fitting of the time traces themselves, and are not constrained to be Gaussian, whereas the GMM analysis in Figs. S3 and S5 are Gaussian fits to the final apparent FRET efficiency histograms.

      Similar to our response to Review 2 above, due to the consistency of the fitted peak positions obtained across different conditions for a given sample, we did not feel that further global analysis was necessary.

      (3) It is shown FRET changes from ~0.85 in the inactive (closed) state to ~0.25 in the active (open) state. How do these values match the expectations based on crystal structure and dye properties?

      As noted in our response to Reviewer 1, translating the apparent FRET values using the assumed Förster distances for A555/Cy5 (per FPbase) suggest a change in D-A distance of ~30 angstroms, whereas the expected change from structures is ~16 Å. We suspect this discrepancy is due to the lipids immediately adjacent to the fluorophores, which may lead to the probes being constrained in an extended position when TM6 moves outwards, thus also reporting the linker length in the distance change. Additionally, such interactions may constrain the donor and acceptor in unfavorable orientations for energy transfer, which would also reduce the FRET efficiency in the active state. Since the calculated D-A distance changes appear too large for GPCR activation, we have opted to not make any structural interpretations. Instead, all of our conclusions are based on resolving individual conformational states and quantifying their relative populations, which is based directly on the measured FRET efficiency distributions, not computed distances.

      (4) The results on the effect of CXCL12-P2G on CXCR4 are confusing...despite being an antagonist, this ligand stabilizes the "active state"...I am not sure if the explanation offered is sufficient that the opening of the intracellular cleft is not sufficient to drive the G protein coupling/activation.

      We agree that the explanation related to the opening of the intracellular cleft being insufficient to drive G protein coupling/activation is speculative and we have removed that text. We now simply propose that the CXCL12 variants inhibit coupling of G proteins to CXCR4 or disrupt interactions necessary for signaling, as stated in the following text to the results on Pg. 8:

      “Despite the ability of CXCL12<sub>P2G</sub> and CXCL12<sub>LRHQ</sub> to stabilize the active R* conformation of CXCR4, both variants are known to act as antagonists (20). This suggests that the CXCL12 mutants inhibit CXCR4 coupling to G proteins not by suppressing the active receptor population but rather by increasing the dynamics of the receptor state-to-state transitions. Our results suggest that the helical movements considered classic signatures of the active state may not be sufficient for CXCR4 to engage productively with G proteins.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank the Reviewer for being very supportive of the work and acknowledging how important it is to understand allosteric modulation in the spike and the potential of this knowledge to contribute to the design of novel therapeutic strategies (for example, disrupting or altering the allosteric networks within the spike can be a novel strategy for drug development against COVID-19). We address their comments below: 

      (1) The Reviewer states that although the strategy used to extract the responses has been "previously validated", the complexity of the interactions investigated requires "a robust statistical analysis, which is not shown quantitatively". 

      As the Reviewer points out, the D-NEMD approach has been previously validated in various protein systems ranging from soluble enzymes to integral membrane proteins, including the spike (e.g. [Kamsri et al. (2024) Biochem; Beer et al. (2024) Chem Sci; Oliveira et al. (2023) J Mol Cell Biol; Chan et al. (2023) JACS Au; Castelli et al. (2023) JACS; Castelli et al. (2023) Protein Sci; Oliveira et al. (2022) Comput Struct Biotechnol J; Gupta et al. (2022) Nat Comm; Oliveira et al. (2021) JACS; Galdadas et al. (2021) eLife; Abreu et al. (2019) Proteins; Oliveira et al. (2019) JACS; Oliveira et al. (2019) Structure]. The Kubo-Onsager relation is used to extract the evolution of the protein's response to a perturbation by comparing the equilibrium and nonequilibrium trajectories at equivalent points in time. The calculated responses at individual times are then averaged over all the repeats (210 repeats in the current work), and the standard error of the mean (SEM) is used to assess the significance of the average response. The SEM indicates how much the calculated mean deviates from the true population mean. Calculating the SEM allows us to determine how accurate the measured response is as an estimate of the population response and assess the convergence of our calculations. The evolution of the average C<sub>α</sub> displacement and corresponding SEM values for each individual monomer can be visualised in detail in Figures S7-S9. We have added a new sentence to the Materials and Methods section in the Supporting Information, explicitly stating how the convergence and statistical significance of the responses were assessed.

      (2) The Reviewer considers that the evidence presented in the paper "is compelling" but suggests performing a sequence analysis to facilitate the understanding of the results by the scientific community. 

      We thank the Reviewer for their excellent suggestion to perform a sequence analysis of the FA site region and its allosteric connections. Indeed, this analysis (Figure S24) clearly shows that several of the mutations, deletions and insertions in the Alpha, Beta, Gamma, Delta, and Omicron variants are located either in or near the regions of the protein shown to respond to the removal of linoleate from the FA site. These sequence changes affect the protein's responses, and are responsible for the differences in allosteric behaviour observed between variants, as described previously for the non-glycosylated spike [Oliveira et al. (2023) J Mol Cell Biol]. Furthermore, some variants, such as Beta, Gamma, and Omicron, contain residue substitutions at the FA site. For example, the lysine in position 417 in the ancestral spike is mutated to asparagine in Beta and Omicron and threonine in the Gamma variant. Another example is arginine 408 in the original protein, which has been replaced by asparagine in several Omicron sub-variants. 

      To summarise, the sequence analysis (Figure S24) supports our initial 3D analysis (Figure S25), indicating that many of the changes observed in the variants of concern are indeed in or close to the allosteric networks involving the FA site. We have now included the sequence analysis results in the current paper and added a new figure to Supporting Information showing the sequence alignments between the ancestral spike and different variants (Figure S24). 

      (3) The Reviewer also has "minor considerations": first, they point to a discrepancy in the presentation of residue values S325 in the plots of Chains A, B, and C of Figure S3; second, they ask why several regions, such as RBM and Furin Site in figures S6, S7, and S8 show significant changes.

      To answer both points raised by the Reviewer, we need to start by explaining that the spike typically features 22 N-glycosylation and at least two O-glycans sites per monomer. These sites have been found to be heterogeneously populated in different experimental studies (e.g. [Watanabe et al. (2020) Science; Shajahan et al. (2020) Glycobiology; Zhang et al. (2021) Mol Cell Proteomics]). Given this, the spike model used as the starting point for this work reflects this heterogeneity, with asymmetric site-specific glycosylation profiles derived from the glycoanalytic data reported by Watanable et al. for N-glycans [Watanabe et al. (2020) Science] and Shajahan et al. for O-glycans [Shajahan et al. (2020) Glycobiology]. This means that the glycan occupancy and composition for each site differ between the three monomers. For example, while monomer A contains the two O-glycans sites (linked to T323 and S325, respectively) fully occupied, monomers B and C only contain the T323 O-glycan. A detailed description of the glycosylation of the spike model is given in the supporting information of [Casalino et al. (2020) ACS Cent Sci].

      Regarding the Reviewer's first minor point, the discrepancy in behaviour observed in Figure S3 for S325 is related to the fact that this glycosylation site is only occupied in monomer A, with no glycans present in this site in monomers B and C. 

      Regarding the second point, the differences observed in the responses between the three monomers in Figures S7-S9 are probably due to asymmetries in the protein dynamics introduced by the different glycosylation patterns in the monomers. 

      We have now added a new paragraph to the materials and methods section in the Supporting Information describing the asymmetric site-specific glycosylation profiles of the monomers.

      (4) Due to the complexity of the allosteric interactions observed, the Reviewer suggests including in the paper a "diagram showing the flow of allosteric interactions" or a "vector showing how the perturbation done in the FA Active site takes contact with other relevant regions". 

      This is an excellent suggestion to facilitate the visualisation of the allosteric networks. We have added a new figure to Supporting Information highlighting the allosteric pathways identified from the DNEMD simulations and the direction of the propagation of the structural changes (Figure S26).

      Reviewer #2:

      We thank the Reviewer for their time in evaluating our manuscript and providing suggestions for improving it and ideas for further work. We are happy that the Reviewer found this to be a "nice paper" with the calculations "well done" and interesting results. We address their comments below: 

      (1) The Reviewer suggests improving the paper by adding a more detailed explanation of the DNEMD simulations approach, a method that, although proposed decades ago, is still generally unfamiliar to the community. They also asked for "information on the convergence of the observables".

      As stated by the Reviewer, a dynamical approach to nonequilibrium molecular dynamics (D-NEMD) was first proposed in the seventies by Ciccotti et al. [Ciccotti et al. (1975) Phys Rev Lett; Ciccotti et al. (1979) J Stat Phys]. This approach combines MD simulations in equilibrium and nonequilibrium conditions. The rationale for the D-NEMD approach is simple and can be described as follows: if an external perturbation (e.g. binding/unbinding of a ligand) is added to a simulation sampling an equilibrium state and, by doing so, a parallel nonequilibrium simulation is started, the structural response of the protein to the perturbation can be directly measured by comparing the equilibrium and nonequilibrium trajectories at equivalent points in time by using the Kubo-Onsager relation as long as enough sapling is gathered (for more details, please see the reviews [Balega et al. (2024) Mol Phys; Oliveira et al. (2021) Eur Phys J B; Ciccotti et al. (2016) Mol Simul]). This approach, although conceptually simple, is very powerful as it allows for computing the evolution of the dynamic response of the protein to the external perturbation, while assessing the convergence and statistical significance of that response. This approach also has the advantage that the convergence and significance of the response can be easily evaluated, and the associated errors can be computed and made as small as desirable by increasing the number of nonequilibrium trajectories. Determining the statistical errors associated with the responses (through, e.g., the determination of the standard error of the mean, SEM) is essential to test if the sampling gathered is sufficient. In this paper, the SEM was calculated for each average C<sub>α</sub> displacement value at times 0.1, 1 and 10 ns after the removal of linoleate, LA (see Figures S7-S9). The SEM indicates how accurate the measured response is as an estimate of the population response and allows us to assess the convergence of the results. 

      Generally, multiple (tens to hundreds) D-NEMD simulations are needed to achieve statistically significant results for biomolecular systems (for examples, see [Balega et al. (2024) Mol Phys; Oliveira et al. (2021) Eur Phys J B]). As such, the length of the D-NEMD simulations (typically 5 to 10 ns) reflects the balance between the computational resources available and the number of replicates needed to achieve statistically significant responses from the system. Following the Reviewer's suggestion, we have now added a brief description of the D-NEMD approach to the main manuscript and expanded the D-NEMD section in the Supporting Information with a more detailed description of the method, including adding a new figure showing a schematic representation of the D-NEMD approach (Figure S5) as well as explicitly stating the settings used in these simulations and how the statistical significance of the responses was assessed. 

      (2) The Reviewer suggests comparing the D-NEMD results with "more traditional analysis, such as correlation analysis, or community network analysis". 

      We agree with the Reviewer that this is an important comparison, which can provide a broader, more articulate and coherent picture of spike allostery and have, therefore, performed additional analysis. The dynamic cross-correlation analysis suggested by the Reviewer is a valuable tool for identifying the regions in the protein influenced by the FA site in equilibrium conditions. However, such an approach is not straightforwardly applicable to D-NEMD simulations, as these simulations are not in equilibrium. Nevertheless, as suggested by the Reviewer, we have determined the cross-correlation matrices for both the equilibrium and D-NEMD simulations (Figure S22), similar to those in our previous work [Galdadas et al. (2021) eLife] and [Oliveira et al. (2022) J Mol Cell Biol]. The analysis of these matrices can provide information about possible allosteric networks. In Figure S22, the cyan and blue regions represent moderate and high negative correlations between C<sub>α</sub> atoms, while orange and red regions correspond to moderate and high positive correlations. Negative correlations indicate residues moving in opposite directions (moving toward or away from each other). In contrast, positive values imply that the residues are moving in similar directions. We also note that, with collaborators, we have compared D-NEMD and other nonequilibrium and equilibrium MD analysis methods for allostery [Castelli et al.  (2023) JACS].

      The cross-correlation maps depicted in Figure S22 show moderate to high positive correlations between the FA sites and two of the three RBDs in the protein. This happens because each FA site sits at the interface between two neighbouring RBDs. Low to moderate negative and mildly positive correlated motions can also be observed between the FA site and the NTDs and fusion peptide surrounding regions, respectively. To facilitate the visualisation of the above-described motions, we have also mapped the statistical correlations for R408 and K417 (two FA site residues able to directly form salt-bridge interactions with the carboxylate head group of LA) on the protein's three-dimensional structure (Figure S23). Figure S23 highlights the patterns of movement described above and allows us to identify the regions whose motions are coupled to the FA site.

      Interestingly, some segments forming the signal propagation pathways, such as R454-K458 in all three monomers, and C525-K537 in monomers B and C, can also be identified from the cross-correlation matrices, showing moderate to high correlations with the FA site (Figures S22-S23). The crosscorrelation maps computed from the equilibrium trajectories (with FA sites occupied with LA) show a slight increase in the dynamic correlations, mainly for the RBDs, compared to the maps obtained from the nonequilibrium trajectories (Figure S22). This indicates that the presence of LA in the FA strengthens the connections between the FA site and other parts of the protein. 

      We have updated the manuscript to include the cross-correlation analysis, with two new figures added to Supporting Information: one depicting the cross-correlation maps for the D-NEMD and equilibrium simulations (Figure S22), and the other showing the statistical correlations for R408 and K417 (Figure S23). 

      (3) The Reviewer considers the observed connection between the fatty acid site and the heme/biliverdin site "interesting" and suggests "exploring the impact of ligand removal on this secondary site on the protein".

      Similarly to the Reviewer, we find the connection between the FA and the heme/biliverdin site fascinating and worthy of further investigation. The observed connection between these two sites shows the complexity of the allosteric effects in the spike. It would be interesting and informative to perform new equilibrium simulations of the heme/biliverdin spike complex and a new set of D-NEMD simulations in which this site is perturbed (e.g. through the removal of the heme group) to map the networks connecting this allosteric site to other functionally important regions of the spike, including the FA site and potentially other allosteric sites. These new simulations would allow us to assess the reversibility of the connection between the FA and heme/biliverdin sites and enhance our understanding of allosteric modulation in the spike and the role of the heme/biliverdin site in this process. However, due to the large size of the system and the associated computational demands, such simulations are not possible within the timeframe of the revision of this paper. These simulations would take many months to complete using our HPC resources. We also note that an experimental structure of the spike containing both heme and linoleate is not available. Further simulation analysis of the communication pathways involving the heme/biliverdin site is an excellent idea for future work.

      (4) The Reviewer "liked the mapping of existing mutations on the communication pathway" and suggested a more detailed study focusing on the effect of the mutations. 

      We fully agree with the Reviewer and consider that a detailed study focusing on the effect of the mutations, insertions, and deletions in the different glycosylated variants of concern (including new emerging ones) would be of great interest. Our previous work using D-NEMD on the non-glycosylated ancestral, Alpha, Delta, Delta plus and Omicron BA.1 spikes revealed significant differences in the allosteric responses to LA removal, with the changes in the variants affecting both the amplitude of the structural responses and the rates at which these rearrangements propagate within the protein [Oliveira et al. (2023) J Mol Cell Biol]. 

      Using the D-NEMD approach to systematically investigate the impact of each individual mutation and their contribution to the overall allosteric response of the glycosylated variants (similar to what we have done previously for the D614G mutation in the non-glycosylated protein [Oliveira et al. (2021) Comput Struct Biotechnol J]) would provide insights into the functional modulation of the spike. However, as noted above in point 3, spike simulations are highly computationally expensive, both in terms of processing and data storage requirements, because of the large size of the protein and the need for equilibrium and D-NEMD simulations. This makes the suggested mutational study unfeasible within the timeframe of the current revisions. It is, however, an excellent idea for future research.

      Reviewer #3:

      We thank the Reviewer for carefully reading and critically reviewing this work and recognising that the findings reported are "based on an impressive amount of sampling" and "meticulous" analysis. We address their comments below: 

      (1) The Reviewer considers that this work "does not clearly show any new findings" as it shows that the glycans do not significantly impact the internal networks in the protein.

      We respectfully disagree with the Reviewer. This work identifies new allosteric effects in the spike, specifically, the connection of the FA site with the heme binding site. The equilibrium simulations alone provide the first analysis of the effects of linoleate binding in the fully glycosylated spike. The finding that glycosylation does not significantly affect the allosteric pathways in the spike is in itself an important finding. Previous D-NEMD simulations investigated only the non-glycosylated spike ([Oliveira et al. (2021) Comput Struct Biotechnol J; Oliveira et al. (2022) J Mol Cell Biol] ) leading to questions of whether the allosteric effects pathways were changed by glycosylation; our results here show that the main conclusions are reinforced, but glycosylation does have some effect on networks, and also on the speed of the dynamical response. To the best of our knowledge, our work represents the first investigation to analyse the impact of glycosylation on the allosteric networks in the spike. We show that even though the presence of glycans in the exterior of the spike does not significantly alter the internal communication pathways in the protein, in some cases (for example, the glycans linked to N234, T373 and S375), they create direct connections between different regions, which may facilitate the propagation of the structural changes. 

      (2) The Reviewer suggests adding a "clear and concise description" of the D-NEMD approach to the manuscript.

      We appreciate that the use of the D-NEMD method to study biomolecular systems is relatively new, and so may be unfamiliar. As explained above in our response to Reviewer 2 (point 1), a brief description of the D-NEMD approach was now included in the main manuscript. A detailed description of the method was also added to Supporting Information, including a new figure representing the rationale for the approach (Figure S5). The interested reader is directed to previous applications and reviews for more details of the method (e.g. [Balega et al. (2024) Mol Phys; Oliveira et al. (2021) Eur Phys J B; Ciccotti et al. (2016) Mol Simul; Kamsri et al. (2024) Biochem; Beer et al. (2024) Chem Sci; Oliveira et al. (2023) J Mol Cell Biol; Chan et al. (2023) JACS Au; Castelli et al. (2023) JACS; Castelli et al. (2023) Protein Sci; Oliveira et al. (2022) Comput Struct Biotechnol J; Gupta et al. (2022) Nat Comm; Oliveira et al. (2021) JACS; Galdadas et al. (2021) eLife; Abreu et al. (2019) Proteins; Oliveira et al. (2019) JACS; Oliveira et al. (2019) Structure]). 

      (3) The Reviewer invites us to "discuss the robustness of the findings with respect to forcefield choices".

      The Reviewer raises an important but rather complex question, and one which can, of course, be posed for any molecular dynamics simulation study. The short answer is that we have chosen state-of-the-art forcefields, which have been shown to give results for the spike that are in good agreement with experiments; glycosylated spike simulations are rather computationally expensive, and constructing the models also requires significant human time and effort. Thus, while in principle interesting, it is not practical to repeat the current simulations with different forcefields. However, as detailed below, comparison of our simulations of the glycosylated and non-glycosylated [Oliveira et al. (2022) Comput Struct Biotechnol J] spike using different forcefields indicates that our conclusions are robust and are not dependent on the choice of forcefield. 

      Comparing the performance and accuracy of different force fields is not straightforward, as the results depend on the system of interest, properties simulated and sampling. In this work, the CHARMM36m all-atom additive force field was used to describe the protein and glycans. CHARMM36m is a widely used force field that has previously been validated for the simulations of biological systems [Huang et al. (2013) J Comput Chem; Guvench et al. (2009) J Chem Theory Comput], including proteins, lipids and glycans, with many of studies adopting it in the literature. Additionally, the glycosylated models of the spike used in this work have also been successfully applied and tested before (e.g. [Dommer et al. (2023) Int J High Perform Comput Appl; Sztain et al. (2021) Nat Chem; Casalino et al. (2021) Int J High Perform Comput Appl; Casalino et al. (2020) ACS Cent Sci]), with their dynamics shown to correlate well with experimental data.   

      It is also worth pointing out that, despite differences in the amplitude of the responses, the allosteric networks identified using the D-NEMD approach for the non-glycosylated [Oliveira et al. (2022) Comput Struct Biotechnol J] and glycosylated spikes are generally similar (Figure S13). While the responses for the non-glycosylated protein were extracted from simulations using the AMBER99SBILDN forcefield [Oliveira et al. (2022) Comput Struct Biotechnol J], those reported in this work were obtained from trajectories using the CHARMM36m forcefield. The similarity between the responses for the two systems (which were simulated using different forcefields) is a good indication that our findings are forcefield independent. 

      (4) The Reviewer suggests comparing our findings with "alternative methods of analysing allostery". 

      As stated above in our response to Reviewer 2 point 2, we consider the suggested comparison an excellent idea. We have therefore performed a dynamic cross-correlation analysis to identify the regions in the protein coupled to the FA site in both equilibrium and nonequilibrium conditions (see Figures S22-S23). Overall, this analysis shows that the FA site motions are strongly coupled to the RBDs and moderately to weakly connected to the NTDs and fusion peptide surrounding regions (please see a detailed description of the results of the correlation analysis in our response to Reviewer 2 point 2). The cross-correlation analysis performed was added to the manuscript, and two new figures were included in the Supporting Information (Figures S22-S23): the first, showing the cross-correlation maps for the D-NEMD and equilibrium simulations; the second, showing the statistical correlations for R408 and K417 (two residues forming the FA site and that can directly interact with the carboxylate head group of LA). 

      We agree that comparing different allosteric analysis methods is interesting, informative and important. As noted above, we have compared D-NEMD and other nonequilibrium and equilibrium MD analysis methods for allostery in the well-characterised K-Ras system [Castelli et al.  (2023) JACS].

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:  

      Reviewer # 1 (Public Review): 

      Summary:

      The authors use an innovative behavior assay (chamber preference test) and standard calcium imaging experiments on cultured dorsal root ganglion (DRG) neurons to evaluate the consequences of global knockout of TRPV1 and TRPM2, and overexpression of TRPV1, on warmth detection. They find a profound effect of TRPM2 elimination in the behavioral assay, whereas the elimination of TRPV1 has the largest effect on the neuronal responses. These findings are very important, as there is substantial ongoing discussion in the field regarding the contribution of TRP channels to different aspects of thermosensation.

      Strengths:

      The chamber preference test is an important innovation compared to the standard two-plate test, as it depends on thermal information sampled from the entire skin, as opposed to only the plantar side of the paws. With this assay, and the detailed analysis, the authors provide strong supporting evidence for a role of TRPM2 in warmth avoidance. The conceptual framework using the Drift Diffusion Model provides a first glimpse of how this decision of a mouse to change between temperatures can be interpreted and may form the basis for further analysis of thermosensory behavior.

      Weaknesses:

      The authors juxtapose these behavioral data with calcium imaging data using isolated DRG neurons. As the authors acknowledge, it remains unclear whether the clear behavioral effect seen in the TRPM2 knockout animals is directly related to TRPM2 functioning as a warmth sensor in sensory neurons. The effects of the TRPM2 KO on the proportion of warmth sensing neurons are very subtle, and TRPM2 may also play a role in the behavioral assay through its expression in thermoregulatory processes in the brain. Future behavioral experiments on sensory-neuron specific TRPM2 knockout animals will be required to clarify this important point.

      Reviewer # 1 (Recommendations for the authors):

      (1) I have no further suggestions for the authors, and congratulate them with their excellent study.

      For the authors information, ref. 42 does contain behavioral data from both male (Fig. 4 and Extended Figure 7) and female (Extended Figure 8) mice.

      We thank the referee for pointing out that both males and female mice were tested in the Vandewauw et al. 2018 study. We deliberated whether to include this at the appropriate section of our manuscript (“Limitations of the Study”). But since Vandewauw et al. assessed noxious heat temperatures and we here assess innocuous warmth temperature, we felt that this reference would not add to the clarification whether there are sex differences in Trp channelbased warmth temperature sensing. In particular, we did not want to “use” the argument and to suggest that there are no sex temperature differences in the warmth range just because Vandewauw et al. did not observe major sex differences in the noxious temperature range. 

      Reviewer #3 (Public Review):  

      Summary and strengths:

      In the manuscript, Abd El Hay et al investigate the role of thermally sensitive ion channels TRPM2 and TRPV1 in warm preference and their dynamic response features to thermal stimulation. They develop a novel thermal preference task, where both the floor and air temperature are controlled, and conclude that mice likely integrate floor with air temperature to form a thermal preference. They go on to use knockout mice and show that TRPM2-/- mice play a role in the avoidance of warmer temperatures. Using a new approach for culturing DRG neurons they show the involvement of both channels in warm responsiveness and dynamics. This is an interesting study with novel methods that generate important new information on the different roles of TRPV1 and TRPM2 on thermal behavior.

      Comments on revisions:

      Thanks to the authors for addressing all the points raised. They now include more details about the classifier, better place their work in context of the literature, corrected the FOVs, and explained the model a bit further. The new analysis in Figure 2 has thrown up some surprising results about cellular responses that seem to reduce the connection between the cellular and behavioral data and there are a few things to address because of this:

      (1) TRPM2 deficient responses: The differences in the proportion of TRPM2 deficient responders compared to WT are only observed at one amplitude (39C), and even at this amplitude the effect is subtle. Most surprisingly, TRPM2 deficient cells have an enhanced response to warm compared to WT mice to 33C, but the same response amplitude as WT at 36C and 39C. The authors discuss why this disconnect might be the case, but together with the lack of differences between WT and TRPM2 deficient mice in Fig 3, the data seem in good agreement with ref 7 that there is little effect of TRPM2 on DRG responses to warm in contrast to a larger effect of TRPV1. This doesn't take away from the fact there is a behavioral phenotype in the TRPM2 deficient mice, but the impact of TRPM2 on DRG cellular warm responses is weak and the authors should tone down or remove statements about the strength of TRPM2's impact throughout the manuscript, for example:

      "Trpv1 and Trpm2 knockouts have decreased proportions of WSNs."

      "this is the first cellular evidence for the involvement of TRPM2 on the response of DRG sensory neurons to warm-temperature stimuli"

      "we demonstrate that TRPV1 and TRPM2 channels contribute differently to temperature detection, supported by behavioural and cellular data"

      "TRPV1 and TRPM2 affect the abundance of WSNs, with TRPV1 mediating the rapid, dynamic response to warmth and TRPM2 affecting the population response of WSNs."

      "Lack of TRPV1 or TRPM2 led to a significant reduction in the proportion of WSNs, compared to wildtype cultures".

      We agree with the referee that the somewhat surprising result of the subtle phenotype in Trpm2 knock-out DRG culture experiments, that became detectable in the course of the new analysis, was overemphasized in the previous version of the manuscript. Per suggestion, we have toned down or removed the statements in the revised manuscript (for the referee to find those changes easily, they are indicated in “track-changes mode” in the submitted document).  

      (2) The new analysis also shows that the removal of TRPV1 leads to cellular responses with smaller responses at low stimulus levels but larger responses with longer latencies at higher stimulus levels. Authors should discuss this further and how it fits with the behavioral data.

      Because these changes shown in Fig. 2E are also subtle (similar to the cellular Trpm2 phenotype discussed above), and because both the “% Responders” (Fig 2.D) and The AUC analysis (Fig. 2F) show a reduction in Trpv1 knock out cultures ––both, at lower and at higher stimulus levels–– we did not want to overstate this difference too much and therefore did not further discuss this aspect in the context of the behavioral differences observed in the Trpv1 knock-out animals.  

      (3) Analysis clarification: authors state that TRPM2 deficient WSNs show "Their response to the second and third stimulus, however, are similar to wildtype WSNs, suggesting that tuning of the response magnitude to different warmth stimuli is degraded in Trpm2-/- animals." but is there a graded response in WT mice? It looks like there is in terms of the %responders but not in terms of response amplitude or AUC. Authors could show stats on the figure showing differences in response amplitude/AUC/responders% to different stimulus amplitudes within the WT group.

      We have added the statistics in the main text, you find them on page 7 (also in “track changes mode”).

      (4) New discussion point: sex differences are "similar to what has been shown for an operant-based thermal choice assay (11,56)", but in their rebuttal, they mention that ref 11 did not report sex differences. 56 does. Check this.

      Thank you for pointing out this mishap. We have now corrected this in the “Limitations of the study” section of the discussion and have removed the Paricio-Montesions et al study from that section and slightly revised the text (see “track-changes” on page 16).

      (5) The authors added in new text about the drift diffusion model in the results, however it's still not completely clear whether the "noise" is due to a perceptual deficit or some other underlying cause. Perhaps authors could discuss this further in the discussion.

      We have now included more discussion concerning this (page 14):

      “However, the increased noise in the drift-di3usion model points to a less reliable temperature detection mechanism. Although noise in drift di3usion models can encompass various sources of variability—ranging from peripheral sensory processing to central mechanisms like attention or motor initiation—the most parsimonious interpretation in our study aligns with a perceptual deficit, given the altered temperatureresponsive neuronal populations we observed. This implies that, despite the substantial loss of WSNs, the remaining neuronal population provides su3icient information for the detection of warmer temperatures, albeit with reduced precision”

      Within the limits of the data that is available, we hope the referee agrees with us that we have now adequately discussed this aspect; we feel that any further discussion would be too speculative.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Authors of this article have previously shown the involvement of the transcription factor Zinc finger homeobox-3 (ZFHX3) in the function of the circadian clock and the development/differentiation of the central circadian clock in the suprachiasmatic nucleus (SCN) of the hypothalamus. Here, they show that ZFHX3 plays a critical role in the transcriptional regulation of numerous genes in the SCN. Using inducible knockout mice, they further demonstrate that the deletion Of Zfhx3 induces a phase advance of the circadian clock, both at the molecular and behavioral levels.

      Strengths:

      - Inducible deletion of Zfhx3 in adults

      - Behavioral analysis

      - Properly designed and analyzed ChIP-Seq and RNA-Seq supporting the conclusion of the behavioral analysis

      Weaknesses:

      - Further characterization of the disruption of the activity of the SCN is required.

      (1) We thank the reviewer for their valuable inputs. Indeed, a comprehensive behavioral assessment of mice of this genotype was executed in Wilcox et al. ;2017 study. In Wilcox et al.; 2017, Figure 4, 6-h phase advance (jetlag) clearly showed faster reentrainment in ZFHX3-KO mice when compared to the controls.

      - The description of the controls needs some clarification.

      (2) We agree with the reviewer and have modified the text at line 211-212 to clearly describe the controls.

      Reviewer #2 (Public review):

      Summary:

      ZFHX3 is a transcription factor expressed in discrete populations of adult SCN and was shown by the authors previously to control circadian behavioral rhythms using either a dominant missense mutation in Zfhx3 or conditional null Zfhx3 mutation using the Ubc-Cre line (Wilcox et al., 2017). In the current manuscript, the authors assess the function of ZFHX3 by using a multi-omics approach including ChIPSeq in wildtype SCNs and RNAseq of SCN tissues from both wildtype and conditional null mice. RNAseq analysis showed a loss of oscillation in Bmal1 and changes in expression levels of other clock output genes. Moreover, a phase advance gene transcriptional profile using the TimeTeller algorithm suggests the presence of a regulatory network that could underlie the observed pattern of advanced activity onset in locomotor behavior in knockout mice.

      In figure1, the authors identified the ZFHX3 bound sites using ChIPseq and compared the loci with other histone marks that occur at promoters, TSS, enhancers and intergenic regions. And the analysis broadly points to a role for ZFHX3 in transcriptional regulation. The vast majority of nearly 40000 peaks overlapped H3K4me3 and K27ac marks, active promoters which also included genes falling under the GO category circadian rhythms. However, no significant differential ZFHX3 bound peaks were detected between ZT3 and ZT15. In these experiments, it is not clear if and how the different ChIP samples (ZFHX3 and histone PTM ChIPs) were normalized/downsampled for analysis. Moreover, it seems that ZFHX3 binding or recruitment has little to do with whether the promoters are active.

      (3) We thank the reviewer for their valuable comment. Different ChIP samples (ZFHX3 and histone PTM ChIPs) were treated in the same manner from preprocessing (quality control by FastQC, adapter trimming, alignment to mm10 genome) and peak calling was performed using respective input samples as control using MACS2 as mentioned in Methods. The data was normalized using bamCoverage tools and bigwig files were generated for visual inspection using UCSC Genome Browser. These additional details are added to Methods at line 592. Finally, BEDTools was employed to study overlapping peaks between ZFHX3 and histone PTMs.

      We agree that, alone, the current data does not make any claim for ZFHX3 being crucial for promoter to be active. Our data clearly suggests that a vast majority of ZFHX3 genomic binding in the SCN was observed at active promoters marked by H3K4me3 and H3K27ac and potentially regulating gene transcription.

      Based on a enrichment of ARNT domains next to K4Me3 and K27ac PTMs, the authors propose a model where the core-clock TFs and ZFHX3 interact. If the authors develop other assays beyond just predictions to test their hypothesis, it would strengthen the argument for role in circadian transcription in the SCN. It would be important in this context to perform a ChIP-seq experiment for ZFHX3 in the knockout animal (described from Figure 2 onwards) to eliminate the possibility of non-specific enrichment of signal from "open chromatin'. Alternatively, a ChIPseq analysis for BMAL1 or CLOCK could also strengthen this argument to identify the sites co-occupied by ZFHX3 and core-clock TFs.

      (4a) We agree that follow-up experiments such as BMAL1/CLOCK ChIPseq suggested by the reviewer will further confirm the proposed interaction of ZFHX3 with core-clock TFs. However, this is beyond the scope of the current study. 

      (4b) Again, conducting complementary ChIPseq in ZFHX3 knockout mice will strengthen the findings, but conducting TF-ChIPseq in a specific brain tissue such as the SCN (unlike peripheral tissues such as liver) does not only warrant use of multiple animals per sample but is also technically challenging and time-consuming to ensure specificity of the sample. For these reasons, datasets such as ours on the SCN are uncommon. Furthermore, in this particular context, we are certain that, based on current dataset, the ZFHX3 peaks (narrow) we observed were well-defined and met the specified statistical criteria mitigating any risk of signal arising from non-specific enrichment from open-chromatin regions.

      Next, they compared locomotor activity rhythms in floxed mice with or without tamoxifen treatment. As reported before in Wilcox et al 2017, the loss of ZFHX3 led to a shorter free running period and reduced amplitude and earlier onset of activity. Overall, the behavioral data in Figure 2 and supplementary figure 2 has been reported before and are not novel.

      (5) We recognise that a detailed circadian behavior assessment from adult mice lacking ZFHX3 has been conducted previously by Nolan lab (Wilcox et al; 2017). In the current study, however, we used a separate cohort of mice, to focus on the behavioral advance noted in 24-h LD cycle and generated a more refined assessment. Importantly, these mice were also used for transcriptomic studies as detailed in Figure 3, which we consider to be a positive feature of our experimental design: behavior and molecular analyses were performed on the same animals.

      Next, the authors performed RNAseq at 4hr intervals on wildtype and knockout animals maintained in light/dark cycles to determine the impact of loss of ZFHX3. Overall transcriptomic analysis indicated changes in gene expression in nearly 36% of expressed genes, with nearly half being upregulated while an equal fraction was downregulated. Pathways affected included mostly neureopeptide neurotransmitter pathways. Surprisingly, there was no correlation between the direction in change in expression and TF binding since nearly all the sites were bound by ZFHX3 and the active histone PTMs. The ChIP-seq experiment for ZFHX3 in the UBC-Cre+Tam mice again could help resolve the real targets of ZFHX3 and the transcriptional state in knockout animals.

      (6) We agree with the reviewer that most of the differentially expressed genes showed ZFHX3 binding at active promoter sites. That said, the current dataset is in line with recently published ZFHX3-CHIPseq data by Baca et al; 2024 [PMID: 38412861] in human neural stem cells and Hu et al; 2024 [PMID: 38871709] in human prostate cancer cells that clearly suggests ZFHX3 binds at active promoters and act as chromatin remodellers/mediators that modulate gene transcription depending on the accessory TFs assembled at target genes. Therefore, finding no correlation in the direction of change in expression is not striking. 

      To determine the fraction of rhythmic transcripts, Using dryR, the authors categorise the rhythmic transcriptome into modules that include genes that lose rhythmicity in the KO, gain rhythmicity in the KO or remain unaffected or partially affected. The analysis indicates that a large fraction of the rhythmic transcriptome is affected in the KO model. However, among core-clock genes only Bmal1 expression is affected showing a complete loss of rhythm. The authors state a decrease in Clock mRNA expression (line 294) but the panel figure 4A does not show this data. Instead it depicts the loss in Avp expression - {{ misstated in line 321 ( we noted severe loss in 24-h rhythm for crucial SCN neuropeptides such as Avp (Fig. 3a).}}

      (7a) Indeed, among the core-clock genes rhythmic expression is lost after ZFHX3 knockout only for Bmal1. However, given the mice were rhythmic (as assessed by wheel-running activity) in LD conditions, the observed 24-h gene expression rhythm in the majority of core-clock genes (Pers and Crys) is consistent with behavior data, and suggests towards an altered molecular clock with plausible scenarios as explained at line 439. That said, the unique and well-defined changes (amplitude and phase) observed as demonstrated in Figure 5 highlights a model in which ZFHX3 exerts differential control, for example in case of Per2 noted advance in molecular rhythm (~2-h), but no such change in Cry, presents an opportunity to delineate further the regulation of TTFL genes.

      (7b) Line 294 revised as – “Bmal1 demonstrating a complete loss of 24-h rhythm (Fig. 4A), and its counterpart Clock mRNA showing overall reduced expression levels (Supplementary Table 3)”.

      7c) Line 321 is referring to loss of Avp expression and the typo has been corrected from “Figure 3a to 4a”. Thank you. 

      However, core-clock genes such as Pers and Crys show minor or no change in expression patterns while Per2 and Per3 show a ~2hr phase advance. While these could only weakly account for the behavioral phase advance, the authors used TimeTeller to assess circadian phase in wildtype and ZFHX3 deficient mice. This approach clearly indicated that while the clock is not disrupted in the knockout animals, the phase advance can be correctly predicted from a network of gene expression patterns.

      Strengths:

      The authors use a multiomic strategy in order to reveal the role of the ZFHX3 transcription factor with a combination of TF and histone PTM ChIPseq, time-resolved RNAseq from wildtype and knockout mice and modeling the transcriptomic data using TimeTeller. The RNAseq experiments are nicely controlled and the analysis of the data indicates a clear impact on gene-expression levels in the knockout mice and the presence of a regulatory network that could underlie the advanced activity onset behavior.

      Weaknesses:

      It is not clear whether ZFHX3 has a direct role in any of the processes and seems to be a general factor that marks H3K4me3 and K27ac marked chromatin. Why it would specifically impact the core-clock TTFL clock gene expression or indeed daily gene expression rhythms is not clear either. Details for treatment of different ChIP samples (ZFHX3 and histone PTM ChIPs) on data normalization for analysis are needed. The loss of complete rhythmicity of Avp and other neuropeptides or indeed other TFs could instead account for the transcriptional deregulation noted in the knockout mice.

      (8) We thank the reviewer for the constructive feedback.  The current data suggests ZFHX3 acts as a mediating factor, occupying targeted active promoter sites and regulating gene expression by partnering with other key TFs in the SCN. Please see point 6 for clarification. The binding sites of ZFHX3 clearly showed enrichment for E-box(CACGTG) motif bound by CLOCK/BMAL1 along with binding sites for key SCN-specific TFs such as RFX (please see Supplementary Fig1). Our data thereby shows that it affects both core-clock and clock output genes (at varied levels) thereby exercising a pervasive control over the SCN transcriptome.

      For treatment of ChIP samples please see point 3. We followed ENCODE guidelines strictly. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - The early activity onset associated with a short photoperiod is a phenotype found in mice with a perturbed function of the SCN like Per2 mutant (PMID: 17218255), or Clock KO (PMID: 22431615). Such disruption of the SCN function also leads to a faster synchronization to day feeding (PMID: 23824542) or jetlag (PMID: 25063847; PMID: 24092737). Therefore, authors should study the synchronizing function of these mice to day feeding and/or jetlag.

      (9) Please see our response to point 1.

      - The description of the negative controls needs clarification. While the "Method" suggests that both Cre- and Cre+ mice are treated with Tamoxifen, the text rather suggest that the controls are Cre- and Cre+ animals non-treated by Tamoxifen. Because of the potential effect of Tamoxifen on gene expression, Cre- treated animals are a required control.

      (10) We thank the reviewer. As detailed in Methods, both Cre- and Cre+ mice were treated with Tamoxifen and compared. The text had been revised at line 212. In addition to this, another genetic control (-Tamoxifen) was also used (Figure 2 and 3).

      - On line 486, authors wrote "It is important to note that although in the present study we used adult-specific Zfhx3 null mutants resulting in global loss of ZFHX3, the effects observed both at molecular and behavioural levels are independent of its functional role(s) in other tissues." On what evidence is this statement based? Using global KO rather suggest a potential role of other tissues.

      (11) We agree with the reviewer, but at line 486 we refer to the effects observed at circadian behavior and daily gene expression in the SCN to be independent of pleiotropic roles of ZFHX3 such as involvement in angiogenesis, spinocerebellar ataxia etc. We have revised the text.

      Reviewer #2 (Recommendations for the authors):

      It is not clear whether the behavioral experiments presented in this study were performed on a new set of animals - different from the cohort used in the Wilcox et al 2017 paper. For example, the proportion of total activity graphed in Figure 2C look strikingly similar to activity counts in Figure 3A in the prior publication (doi: 10.1177/0748730417722631)- down to the small burst in activity after ZT20 in the control (-Tam) group.

      (12) The behavioral experiments presented in this study were performed on a completely new cohort of mice to those used in Wilcox et al.; 2017. The mice used for behavioral assessment. In the current study were later used for molecular experiments. Please see point 5.

      Information on ChIP-seq such as read length, PE or SE seq, number of reads/replicate/condition/sample is missing. Versions of the softwares used should be indicated if known.

      (13) The details are added as:

      (13a) “Briefly, SCN punches were pooled from 80 mice at each. designated times (ZT3, ZT15) corresponding to one biological replicate per timepoint” at line 567.

      (13b) “24 ug sheared chromatin sample collected from each time point (ZT3, ZT15)” at line 571.

      (13c) “75-bp single end sequencing : 30 million reads/sample” at line 577.  

      (13d) “At line 584 – MACS algorithm v2.1.0 added”

      Versions of other softwares used were already mentioned.

    1. Author response:

      We thank the reviewers for their appreciation of our work and the recommendations to improve the manuscript. We have included a point-by-point response below. To summarize, for revision we plan to:

      • Clarify the manuscript to improve readability and coherence,

      • Ensure that all figures are thoroughly discussed in the text,

      • Tone down biological claims based on RNA velocity where applicable.

      While we agree with the reviewer that functional validation and/or spatial proteomics data accompanying this study could provide additional insights and broader contextualization, this is unfortunately beyond the scope of the study.

      Reviewer #1 (Public review):

      Summary:

      The authors conducted a spatial analysis of dysplastic colon tissue using the Slide-seq method. Their main objective is to build a detailed spatial atlas that identifies distinct cellular programs and microenvironments within dysplastic lesions. Next, they correlated this observation with clinical outcomes in human colorectal cancer.

      Strengths:

      The work is a good example of utilising spatial methods to study different tumour models. The authors identified a unique stem cell program to understand tumours gently and improve patient stratification strategies.

      Weaknesses:

      However, the study's predominantly descriptive nature is a significant limitation. Although the spatial maps and correlations between cell states are interesting observations, the lack of functional validation-primarily through experiments in mouse models-weakens the causal inferences regarding the roles these cellular programs play in tumour progression and therapy resistance.

      We thank the reviewer for this comment. Indeed, functional validation to pin down causal dependencies and a more thorough investigation of tumor progression and therapy resistance both in mouse model as well as human patients and/or patient derived samples would broaden the insights to be gained from this work. Unfortunately, this is beyond the scope of this study.

      The authors also missed an opportunity to link the mutational status of malignant cells with the cellular neighbourhoods.

      The data reported in this study only contains spatial data for one mouse model (AV). As spatial data for the other model (AKPV) is missing, it is not possible to link the mutational type of the model with the cellular neighborhoods. We did investigate whether there is extra "somatic" mutational heterogeneity in the AV data, both regarding single nucleotide variations (SNVs) and copy number variations (CNVs). But at the time when the mice were sacrificed (after 3 weeks) there was no significant mutational heterogeneity discoverable.

      Overall, the study contributes to profiling the dysplastic colon landscape. The methodologies and data will benefit the research community, but further functional validation is crucial to validate the biological and clinical implications of the described cellular interactions.

      Reviewer #2 (Public review):

      In their study, Avraham-Davidi et al. combined scRNA-seq and spatial mapping studies to profile two preclinical mouse models of colorectal cancer: Apcfl/fl VilincreERT2 (AV) and Apcfl/fl LSL-KrasG12D Trp53fl/fl Rosa26LSL-tdTomato/+ VillinCreERT2 (AKPV). In the first part of the manuscript, the authors describe the analysis of the normal colon and dysplastic lesions induced in these models following tamoxifen injection. They highlight broad variations in immune and stromal cell composition within dysplastic lesions, emphasizing the infiltration of monocytes and granulocytes, the accumulation of IL-17+gdT cells, and the presence of a distinct group of endothelial cells. A major focus of the study is the remodeling of the epithelial compartment, where the most significant changes are observed. Using non-negative matrix factorization, the authors identify molecular programs of epithelial cell functions, emphasizing stemness, Wnt signaling, angiogenesis, and inflammation as major features associated with dysplastic cells. They conclude that findings from scRNA-seq analyses in mouse models are transposable to human CRC. In the second part of the manuscript, the authors aim to provide the spatial context for their scRNA-seq findings using Slide-seq and TACCO. They demonstrate that dysplastic lesions are disorganized and contain tumor-specific regions, which contextualize the spatial proximity between specific cell states and gene programs. Finally, they claim that these spatial organizations are conserved in human tumors and associate region-based gene signatures with patient outcomes in public datasets. Overall, the data were collected and analyzed using solid and validated methodology to offer a useful resource to the community.

      Main comments:

      (1) Clarity

      The manuscript would benefit from a substantial reorganization to improve clarity and accessibility for a broad readership. The text could be shortened and the number of figure panels reduced to emphasize the novel contributions of this work while minimizing extensive discussions on general and expected findings, such as tissue disorganization in dysplastic lesions. Additionally, figure panels are not consistently introduced in the correct order, and some are not discussed at all (e.g., Figure S1D; Figure 3C is introduced before Figure 3A; several panels in Figure 4 are not discussed). The annotation of scRNA-seq cell states is insufficiently explained, with no corresponding information about associated genes provided in the figures or tables. Multiple annotations are used to describe cell groups (e.g., TKN01 = γδ T and CD8 T, TKN05 = γδT_IL17+), but these are not jointly accessible in the figures, making the manuscript challenging to follow. It is also not clear what is the respective value of the two mouse models and time points of tissue collection in the analysis.

      We thank the reviewer for this suggestion. For the revision we plan to clarify the manuscript to improve readability and coherence in text and figures, and expand on the cell type nomenclature.

      (2) Novelty

      While the study is of interest, it does not present major findings that significantly advance the field or motivate new directions and hypotheses. Many conclusions related to tissue composition and patient outcomes, such as the epithelial programs of Wnt signaling, angiogenesis, and stem cells, are well-established and not particularly novel. Greater exploration of the scRNA-seq data beyond cell type composition could enhance the novelty of the findings. For instance, several tumor microenvironment clusters uniquely detected in dysplastic lesions (e.g., Mono2, Mono3, Gran01, Gran02) are identified, but no further investigation is conducted to understand their biological programs, such as applying nNMF as was done for epithelial cells. Additional efforts to explore precise tissue localization and cellular interactions within tissue niches would provide deeper insights and go beyond the limited analyses currently displayed in the manuscript.

      We thank the reviewer for this comment. Our study aimed to spatially characterize the tumor microenvironment, with scRNA-seq analysis serving to support this spatial characterization.<br /> Due to technical limitations—such as the number of samples and the limited capture efficiency of Slide-seq—the resolution of immune cell identification in our spatial analysis is constrained. Additionally, while immune and stromal cells formed distinct clusters, epithelial cells exhibited a continuum that was better captured using nNMF.

      Lastly, our manuscript provides a general characterization of monocyte and granulocyte populations in scRNA-seq (line 142) and their spatial microenvironments (line 390). We believe that additional analyses of these populations would be beyond the scope of this study and could place an unnecessary burden on the reader. Instead, we suggest that such analyses be explored in future studies.

      We remark that we analyzed tissue localization for two entirely different spatial transcriptomics assays (Slide-seq and Cartana) to the resolution of cell types and programs, which was feasible within the constraints of the sparsity and gene panel and sample size in the experiments. A path to further increase the resolution of investigation in this dataset is to include other datasets, e.g. by the emerging transformer-based spatial transcriptomics integration methods, which unfortunately is outside the scope of the current study.

      We also remark that the current manuscript already includes an investigation of cellular interactions within tissue niches based on COMMOT (Fig 4k, Fig S8i, Supp Item 4).

      (3) Validation

      Several statements made by the authors are insufficiently supported by the data presented in the manuscript and should be nuanced in the absence of proper validation. For example:<br /> (a) RNA velocity analyses: The conclusions drawn from these analyses are speculative and need further support.

      We thank the reviewer for this comment. We will clarify that our conclusions from the RNA velocity analysis need further support by experimental validation, which is out of the scope of the study.

      (b) Annotations of epithelial clusters as dysplastic: These annotations could have been validated through morphological analyses and staining on FFPE slides.

      We thank the reviewer for this comment. While this could have been a possible approach, our study primarily relies on scRNA-seq, which does not preserve tissue morphology, and Slide-seq of fresh tissue, where such an analysis is particularly challenging.

      (c) Conservation of mouse epithelial programs in human tumors: The data in Figure S5B does not convincingly demonstrate the enrichment of stem cell program 16 in human samples. This should be more explicitly stated in the text, given the emphasis placed on this program by the authors.

      We thank the reviewer for pointing this out. Indeed, Figure S5B does not demonstrate the program 16 enrichment in human samples. We will clarify this in the manuscript.

      (d) Figure S6E: Cluster Epi06 is significantly overrepresented in spatial data compared to scRNA-seq, yet the authors claim that cell type composition is largely recapitulated without further discussion, which reduces confidence in other conclusions drawn.

      We thank the reviewer for this remark. Indeed, Epi06 was a cluster which drew our attention during early analyses for its mixed expression profiles with contributions of vastly different cell types. We concluded that this is best explained by doublets and excluded it from further analysis. In the current manuscript we only briefly hinted at this in figure legend 2A ("Cluster Epi06: doublets (not called by Scrublet)"), and we will expand on this in the revised manuscript. The observation that this cluster is significantly overrepresented in the annotation of the spatial data is not surprising in this context as this annotation comes from the decomposition of compositional data which contains contributions of multiple cells per Slide-seq bead which are structurally very similar to doublets. We will add this point as well to the revised manuscript.

      Furthermore, stronger validation of key dysplastic regions (regions 6, 8, and 11) in mouse and human tissues using antibody-based imaging with markers identified in the analyses would have considerably strengthened the study. Such validation would better contextualize the distribution, composition, and relative abundance of these regions within human tumors, increasing the significance of the findings and aiding the generation of new pathophysiological hypotheses.

      We agree with the reviewer with their assessment that validation by antibody-based imaging (or other spatial proteomics data) would have been useful follow-up experiments to the experiments and results presented in our manuscript, yet these are beyond the scope of the current study.

    1. Author response:

      We thank the editor and reviewers for recognizing the value of studying neural dynamics and behavior in naturalistic, task-free conditions and the importance of linking olfactory bulb activity to movement and place.  We appreciate the suggestions for analyses and edits to further quantify these relationships and clarify our interpretation.

      The primary sticking point regards our result that olfactory bulb neurons are selective for place:

      “analysis supporting the potentially exciting result on the encoding of place is currently incomplete”

      In this paper, we report evidence for spatial selectivity in the olfactory bulb, make relative comparisons with canonical “place cells” in the hippocampus, and control for alternative hypotheses such as odor- or behavior-driven sources, to motivate future experiments which can more precisely identify the mechanistic basis of these responses. Throughout the reviews, our result on the correlation of OB activity with place is not questioned, but rather whether we can better determine how much behavior or odor explain this result. Regarding the concern about behavior, we are confident that the spatial non-uniformities of breathing rhythms do not explain OB spatial selectivity based on the analyses included in the paper. We thank the reviewers for suggestions of additional analyses with which we can further test this claim and will incorporate several, as we will detail below.

      Regarding the points about odor, indeed we do not claim that we have entirely ruled out odors as an explanation of place selectivity in the bulb. Rather, our claim is that our analyses show that scent marks on the floor, the most obvious olfactory place cue, cannot fully explain place selectivity.  We acknowledge that our experiments do not exclude the possibility that other odors in the environment may also contribute. Odors are invisible and difficult to measure, and the odor sensitivity of rodents vastly outstrips that of any device known to humanity. Indeed, no study of which we are aware can fully rule out odor as a cue to the animal’s internal model of place. However, encoding of place, even if explained by odor, is still encoding of place. We will clarify our interpretation of the data, and we thank the reviewers for proposing ideas for further analysis, some of which we are implementing. However, experiments such as effects of distal cues on spatially selective olfactory bulb neurons are beyond the scope of this paper.

      We will further test whether neurons in the olfactory bulb are spatially selective by reporting additional statistical analyses including:

      - More completely quantifying the spatial distribution of sniffing patterns (visualized in Figure 8 - Sup 1) by plotting sniff-frequency distributions across locations in the arena.

      - Demonstrating independent contribution of place over speed in GLMs

      - Characterizing the temporal stability of spatially selective cells across a session (1st half vs second half)

      - reporting mean decoding errors for olfactory bulb and hippocampal decoders (visualized in Fig 7C)

      We will add to the analyses of behavioral state models by:

      - Comparing the performance of hidden Markov models fit to breathing frequency alone with those fit to breathing frequency and movement speed

      - Quantifying individual differences in state-transition matrices

      Further, we address the question around the use of “grooming” as a descriptor of the intermediate sniff frequency state. We used the term ‘grooming’ based on extensive video observation. During this state, ‘Speed’ is significantly non-zero because we defined speed as the movement of the head keypoint which moves substantially during grooming. We will make this point more explicit in the figures and text, and we will provide additional video documentation of these and the other behavioral states.

      Lastly, we will further discuss the fact stated in the first paragraph of the Results section that mice are placed in “head-fixation on a stationary platform” and thus inhibited from running. While different breathing states than those observed in our stationary platform may occur during head-fixation with a treadmill, we believe the differences between head-fixed running and free moving running are beyond the scope of this paper. Nevertheless, it’s an important point that we will more explicitly discuss in our revision.

      We appreciate these constructive comments and hope these additional analyses and textual edits will help clarify our interpretations and motivate future experiments to further test and refine them.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates the effects of the explicit recognition of statistical structure and sleep consolidation on the transfer of learned structure to novel stimuli. The results show a striking dissociation in transfer ability between explicit and implicit learning of structure, finding that only explicit learners transfer structure immediately. Implicit learners, on the other hand, show an intriguing immediate structural interference effect (better learning of novel structure) followed by successful transfer only after a period of sleep.

      Strengths:

      This paper is very well written and motivated, and the data are presented clearly with a logical flow. There are several replications and control experiments and analyses that make the pattern of results very compelling. The results are novel and intriguing, providing important constraints on theories of consolidation. The discussion of relevant literature is thorough. In sum, this work makes an exciting and important contribution to the literature.

      Weaknesses:

      There have been several recent papers which have identified issues with alternative forced choice (AFC) tests as a method of assessing statistical learning (e.g. Isbilen et al. 2020, Cognitive Science). A key argument is that while statistical learning is typically implicit, AFC involves explicit deliberation and therefore does not match the learning process well. The use of AFC in this study thus leaves open the question of whether the AFC measure benefits the explicit learners in particular, given the congruence between knowledge and testing format, and whether, more generally, the results would have been different had the method of assessing generalization been implicit. Prior work has shown that explicit and implicit measures of statistical learning do not always produce the same results (eg. Kiai & Melloni, 2021, bioRxiv; Liu et al. 2023, Cognition).

      The authors argued in their response to this point that this issue could have quantitative but not qualitative impacts on the results, but we see no reason that the impact could not be qualitative. In other words, it should be acknowledged that an implicit test could potentially result in the implicit group exhibiting immediate structure transfer.

      We thank the reviewer for their feedback and added a statement in our discussion section acknowledging the possible effects of alternative measures of learning.

      Given that the explicit/implicit classification was based on an exit survey, it is unclear when participants who are labeled "explicit" gained that explicit knowledge. This might have occurred during or after either of the sessions, which could impact the interpretation of the effects and deserves discussion.

      We agree with the mentioned shortcoming in principle, although there are good methodological reasons for this, as discussed in our previous response. We added a statement on this topic to our discussion to make the potential issues and our reasoning in the design decision more transparent for the reader.

      Reviewer #2 (Public review):

      Summary:

      Sleep has not only been shown to support the strengthening of memory traces, but also their transformation. A special form of such transformation is the abstraction of general rules from the presentation of individual exemplars. The current work used large online experiments with hundreds of participants to shed further light on this question. In the training phase participants saw composite items (scenes) that were made up of pairs of spatially coupled (i.e., they were next to each other) abstract shapes. In the initial training, they saw scenes made up of six horizontally structured pairs and in the second training phase, which took place after a retention phase (2 min awake, 12 hour incl. sleep, 12 h only wake, 24 h incl. sleep), they saw pairs that were horizontally or vertically coupled. After the second training phase, a two-alternativesforced-choice (2-AFC) paradigm, where participants had to identify true pairs versus randomly assembled foils, was used to measure performance on all pairs. Finally, participants were asked five questions to identify, if they had insight into the pair structure and post-hoc groups were assigned based on this. Mainly the authors find that participants in the 2 minute retention experiment without explicit knowledge of the task structure were at chance level performance for the same structure in the second training phase, but had above chance performance for the vertical structure. The opposite was true for both sleep conditions. In the 12 h wake condition these participants showed no ability to discriminate the pairs from the second training phase at all.

      Strengths:

      All in all, the study was performed to a high standard and the sample size in the implicit condition was large enough to draw robust conclusions. The authors make several important statistical comparisons and also report an interesting resampling approach. There is also a lot of supplemental data regarding robustness.

      Weaknesses:

      My main concern regards the small sample size in the explicit group and the lack of experimental control.

      We thank the reviewer for the valuable feedback throughout the review process. The issues mentioned here have been addressed in our previous response.

      Reviewer #3 (Public review):

      In this project, Garber and Fiser examined how the structure of incidentally learned regularities influences subsequent learning of regularities, that either have the same structure or a different one. Over a series of six online experiments, it was found that the structure (spatial arrangement) of the first set of regularities affected learning of the second set, indicating that it has indeed been abstracted away from the specific items that have been learned. The effect was found to depend on the explicitness of the original learning: Participants who noticed regularities in the stimuli were better at learning subsequent regularities of the same structure than of a different one. On the other hand, participants whose learning was only implicit had an opposite pattern: they were better in learning regularities of a novel structure than of the same one. However, when an overnight sleep separated the first and second learning phases, this opposite effect was reversed and came to match the pattern of the explicit group, suggesting that the abstraction and transfer in the implicit case were aided by memory consolidation.

      In their revision the authors addressed my major comments successfully and I commend them for that.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      We would encourage the authors to add text to the manuscript that acknowledges/discusses the two issues pointed out in our review.

      We added relevant passages to the discussion section of the manuscript.

      Reviewer #2 (Recommendations for the authors):

      The authors have improved some sections of the manuscript and this is reflected in my assessment. The major weaknesses remain unchanged. Since my review is published alongside the paper, readers can make up their own mind regarding their severity.

      My only hard ask would be to add that the study was not preregistered into the main manuscript as I asked before! I am surprised that the authors are so reluctant to honestly state this fact....

      We have not stated this fact in our manuscript until now since our understanding is that papers that report preregistered studies state and cite their preregistration in their method section, while any omission of such a statement by default conveys that no preregistration occurred. In fact, we cannot recall encountering papers with statements of no-preregistration in the literature. Nevertheless, we have no issue stating that our study was not preregistered and per the reviewer's request, we have added such an explicit statement in our manuscript.

      Reviewer #3 (Recommendations for the authors):

      *  I strongly urge the authors to remove the Results sub-sections from Methods.

      We thank the reviewer for highlighting this issue arising from our previous layout, which we decided to handle the following way. We re-labeledl the subsections in question as “Additional Analyses” to avoid confusion, we removed any redundant findings already reported in Results of the main text, and we moved a small number of more substantial findings from the Methods Section to the main text Results as requested. We believe that this solution constitutes the most readable option, as we do not clutter the main results with extensive sanity checks and results

      of minor interest, while we also do not need to establish experiment-wise result sections in the Supplementary Materials, which would further disperse information interested readers might look for.

      *  Authors report that in Experiment 4 "Participants with explicit knowledge (n=23) show the same pattern of results as they did in Experiment 1", but that seems inaccurate, as they did learn novel pairs in Exp4 whereas they did not in Exp1. This can be seen in the figure and also in Methods-Results: "performing above chance for ... pairs of a novel structure (M=69.6, SE=5.9, d=0.69, t(22)=3.33 p=0.012, BF=13.6) in the second training phase"

      We thank the reviewer for pointing out this error in our interpretation of the results and adjusted the section in question to better align with what our result actually shows.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Multiple compounds that inhibit ATP-sensitive potassium (KATP) channels also chaperone channels to the surface membrane. The authors used an artificial intelligence (AI)-based virtual screening (AtomNet) to identify novel compounds that exhibit chaperoning effects on trafficking-deficient disease-causing mutant channels. One compound, which they named Aekatperone, acts as a low affinity, reversible inhibitor and effective chaperone. A cryoEM structure of KATP bound to Aekatperone showed that the molecule binds at the canonical inhibitory site.

      Strengths and weaknesses:

      The details of the AI screening itself are inevitably opaque, but appear to differ from classical virtual screening in not involving any physical docking of test compounds into the target site. The authors mention criteria that were used to limit the number of compounds, so that those with high similarity to known binders and 'sequence identity' (does this mean structural identity) were excluded. The identified molecules contain sulfonylurea-like moieties. How different are they from other sulfonylure4as?

      We thank the reviewers for the questions. As part of the library preparation, molecules with greater than 0.5 Tanimoto similarity in ECFP4 space to any known binders of the target protein and its homologs within 70% sequence identity were excluded to increase the possibility of identifying novel hits. After scoring and ranking the molecules by the AtomNet® technology, a diversity clustering was performed using the Butina algorithm (Butina D. Unsupervised Data Base Clustering Based on Daylight’s Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets, J. Chem. Inf. Comput. Sci. 1999, 39, 747–750) with a Tanimoto similarity cutoff of 0.35 in ECFP4 space to minimize selection of structurally similar scaffolds for the final compound buy-list. We have revised the results and methods sections to make this clear.

      Sulfonylureas are defined by their core structure comprising a sulfonyl group (–S(=O)<sub>2</sub>) and a urea moiety (–NH–CO–NH–). While some compounds identified in our study contain a sulfonamide group (R-S(=O) <sub>2</sub>-NR<sub>2</sub>), they differ structurally from sulfonylureas by lacking the key urea group and by incorporating unique R-group substitutions (we have now added this to Figure 1A legend). For example, compound C27 (Z2068224500) includes a sulfonamide group but not a urea moiety. Likewise, C45 (Aekatperone, Z1620764636) contains a sulfonamide group along with an aromatic, nitrogen-rich heterocyclic ring, but no urea group. Additionally, the R-groups in these compounds are more complex than the simple aromatic or alkyl chains typical of sulfonylureas. They include heterocyclic aromatic systems and nitrogen-rich structures, which likely influence their binding properties and lipophilicity. These structural differences suggest distinct functional and pharmacological profiles as supported by our biochemical and functional studies.

      The experimental work confirming that Aekatperone acts to traffic mutant KATP channels to the surface and acts as a low affinity, reversible, inhibitor is comprehensive and clear, with very convincing cell biological and patch-clamp data, as is the cryoEM structural analysis, for which the group are leading experts. In addition to the three positive chaperone-effective molecules, the authors identified a large number of compounds that are predicted binders but apparently have no chaperoning effect. Did any of them have inhibitory action on channels? If so, does this give clues to separating chaperoning from inhibitory effects?

      This is an interesting question. Evidence from cryo-EM, biochemical and electrophysiology studies reveal a critical role of Kir6.2 N-terminus in K<sub>ATP</sub> channel assembly and gating, and that pharmacological chaperones like glibenclamide, repaglinide, carbamazepine, and now aekatperone exert their chaperoning and inhibitory effects by stabilizing the interaction between Kir6.2 N-terminus and the SUR1-ABC core. This stabilization, while promoting the assembly of Kir6.2 and SUR1 to “chaperone” trafficking-impaired mutant channels to the cell surface, also inhibits the channel by restricting the Kir6.2 C-terminal domain from rotating to an open state. An additional mechanism by which these compounds inhibit channel activity is by preventing SUR1-NBD dimerization, which mediates physiological activation of the channel by MgADP (see review: Driggers CM, Shyng SL. Mechanistic insights on K<sub>ATP</sub> channel regulation from cryo-EM structures. J Gen Physiol. 2023 Jan 2;155(1): e202113046, PMID: 36441147). From our compound screening, we did find some compounds that showed mild inhibition of the channel by electrophysiology but no obvious chaperone effects by western blots. It is possible that small chaperoning effects of some compounds showing mild channel inhibition effects were missed due to the lower sensitivity of the western blot assay compared to electrophysiology. Alternatively, these compounds could inhibit channels by preventing SUR1NBD dimerization without stabilizing the Kir6.2 N-terminus, which is required for the chaperone effect based on our model. Unfortunately, we did not find any compounds that show chaperone effects but no channel inhibition effects, which is consistent with our understanding of how this type of K<sub>ATP</sub> chaperones work (i.e. by stabilizing Kir6.2 N-terminus interaction with SUR1’s ABC core).

      The authors suggest that the novel compound may be a promising therapeutic for treatment of congenital hyperinsulinism due to trafficking defective KATP mutations. Because they are low affinity, reversible, inhibitors. This is a very interesting concept, and perhaps a pulsed dosing regimen would allow trafficking without constant channel inhibition (which otherwise defeats the therapeutic purpose), although it is unclear whether the new compound will offer advantages over earlier low-affinity sulfonylurea inhibitor chaperones. These include tolbutamide which has very similar affinity and effect to Aekatperone. As the authors point out this (as well as other sulfonlyureas) are currently out of favor because of potential adverse cardiovascular effects, but again, it is unclear why Aekatperone should not have the same concerns.

      We thank the reviewer for the comments. This is clearly an important question to address in the future. While we have not directly tested the effects of Aekatperone on cardiac functions, we did assess its inhibitory effect on cells expressing the cardiac K<sub>ATP</sub> channel isoform (SUR2A/Kir6.2). Our results indicate that Aekatperone exhibits higher sensitivity toward the pancreatic K<sub>ATP</sub> channel isoform (SUR1/Kir6.2) compared to the cardiac isoform. However, we acknowledge that Aekatperone could still have cardiotoxic effects through its potential action on other channels, such as the hERG channel.

      It is worth noting that tolbutamide, despite its known cardiotoxic effects, does not exert these effects through cardiac K<sub>ATP</sub> channel inhibition. This has been demonstrated in studies showing no inhibitory effect of tolbutamide on SUR2A/Kir6.2 channels and on channels formed by Kir6.2 and SUR1 harboring the S1238Y mutation (also shown as S1237Y in some studies using a different SUR1 isoform)--the amino acid substitution found in SUR2A at the corresponding position (Ashfield R, Gribble FM, Ashcroft SJ, Ashcroft FM. Identification of the high-affinity tolbutamide site on the SUR1 subunit of the K<sub>ATP</sub> channel. Diabetes. 1999 Jun;48(6):1341-7, PMID: 10342826). This suggests that tolbutamide’s cardiotoxic effects might involve other targets like the hERG channel. Interestingly, tolbutamide contains a hydrophobic tail and aromatic rings that align well with the structural features for hERG interaction (Garrido A, Lepailleur A, Mignani SM, Dallemagne P, Rochais C. hERG toxicity assessment: Useful guidelines for drug design. Eur J Med Chem. 2020 Jun 1;195:112290, PMID: 32283295). In contrast, highaffinity sulfonylureas such as glibenclamide and glimepiride, which have additional benzamide moieties, are associated with lower cardiovascular risks (Douros A, Yin H, Yu OHY, Filion KB, Azoulay L, Suissa S. Pharmacologic Differences of Sulfonylureas and the Risk of Adverse Cardiovascular and Hypoglycemic Events. Diabetes Care. 2017, 40:1506-1513, PMID:

      28864502). Given these considerations, a comprehensive assessment of Aekatperone’s potential cardiotoxicity is crucial. Future studies involving in silico modeling, in vitro, and in vivo experiments will be essential to evaluate Aekatperone’s interaction with hERG and other offtarget effects. These efforts will help clarify its safety profile. This point has now been added to the Discussion.

      Reviewer #2 (Public review):

      Summary:

      In their study 'AI-Based Discovery and CryoEM Structural Elucidation of a KATP Channel Pharmacochaperone', ElSheikh and colleagues undertake a computational screening approach to identify candidate drugs that may bind to an identified binding pocket in the SUR1 subunit of

      KATP channels. Other KATP channel inhibitors such as glibenclamide have been previously shown to bind in this pocket, and in addition to inhibition KATP channel function, these inhibitors can very effectively rescue cell surface expression of trafficking deficient KATP mutations that cause excessive insulin secretion (Congenital Hyperinsulinism). However, a challenge for their utility for treatment of hyperinsulinism has been that they are powerful inhibitors of the channels that are rescued to the channel surface. In contrast, successful therapeutic pharmacochaperones (eg. CFTR chaperones) permit function of the channels rescued to the cell membrane. Thus, a key criteria for the authors' approach in this case was to identify relatively low affinity compounds that target the glibenclamide binding site (and be washed off) - these could potentially rescue KATP surface expression, but also permit KATP function.

      Strengths:

      The main findings of the manuscript include:

      (1) Computational screening of a large virtual compound library, followed by functional screening of cell surface expression, which identified several potential candidate pharmacochaperones that target the glibenclamide binding site.

      (2) Prioritization and functional characterization of Aekatperone as a low affinity KATP inhibitor which can be readily 'washed off' in patch clamp, and cell based efflux assays. Thus the drug clearly rescues cell surface expression, but can be manipulated experimentally to permit function of rescued channels.

      (3) Determination of the binding site and dynamics of this candidate drug by cryo-EM, and functional validation of several residues involved in drug sensitivity using mutagenesis and patch clamp.

      The experiments are well-conceived and executed, and the study is clearly described. The results of the experiments are very straightforward and clearly support the conclusions drawn by the authors. I found the study to provide important new information about KATP chaperone effects of certain drugs, with interesting considerations in terms of ion channel biology and human disease.

      Weaknesses:

      I don't have any major criticisms of the study as described, but I had some remaining questions that could be addressed in a revision.

      (1) The chaperones can effectively rescue KATP trafficking mutants, but clearly not as strongly as the higher affinity inhibitor glibenclamide. Is this relationship between inhibitory potency, and efficacy of trafficking an intrinsic challenge of the approach? I suspect that it may be an intractable problem in the sense that the inhibitor bound conformation that underlies the chaperone effect cannot be uncoupled from the inhibited gating state. But this might not be true (many partial agonist drugs with low efficacy can be strongly potent, for example). In this case, the approach is really to find a 'happy medium' of a drug that is a weak enough inhibitor to be washed away, but still strong enough to exert some satisfactory chaperone effect. Could some additional clarity be added in the discussion on whether the chaperone and gating effects can be 'uncoupled'.

      Thank you for the suggestion. A similar question was raised by Reviewer 1, which was addressed above (public review, point 2). We have now added more discussion to clarify this point.

      (2) Based on the western blots in Figure 2B, the rescue of cell surface expression appears to require a higher concentration of AKP compared to the concentration response of channel inhibition (~9 microM in Figure 3, perhaps even more potent in patch clamp in Figure 2C). Could the authors clarify/quantify the concentration response for trafficking rescue?

      Thank you for bringing up this observation. Indeed, the pharmacochaperone effects of Aekatperone as well as other previously published K<sub>ATP</sub> pharmacochaperones require higher concentrations compared to their inhibitory effects on surface-expressed channels. This difference likely stems from the necessity for these compounds to cross the cell membrane and interact with newly synthesized channels in the endoplasmic reticulum, where the trafficking rescue occurs. We estimate that effective pharmacochaperone activity for Aekatperone can be achieved at concentrations ranging from 50 to 100 µM in cells expressing trafficking-deficient K<sub>ATP</sub> channel mutants, higher than that required for inhibition of surface-expressed channels (~9 µM IC50). Future work could focus on medicinal chemistry modifications, for example esterification of Aekatperone (Zhou G. Exploring Ester Prodrugs: A Comprehensive Review of Approaches, Applications, and Methods. Pharmacology & Pharmacy, 2024, 15, 269-284). Once inside the cell, the esters would be cleaved by endogenous esterases to release the active compound, ensuring efficient intracellular delivery. This strategy could potentially improve membrane permeability and bioavailability of the compound, which would lower the required concentrations to achieve desired chaperoning effects.

      (3) A future challenge in the application of pharmacochaperones of this type in hyperinsulinism may be the manipulation of chaperone concentration in order to permit function. In experiments it is straightforward to wash off the chaperone, but this would not be the case in an organism. I wondered if the authors had attempted to rescue channel function with diazoxide ine presence of AKP, rather than after washing off (ie. is AKP inhibition insurmountable, or can it be overcome by sufficient diazoxide).

      Thank you for raising this important point. We have previously shown (Martin GM et al. Pharmacological Correction of Trafficking Defects in ATP-sensitive Potassium Channels Caused by Sulfonylurea Receptor 1 Mutations. J Biol Chem. 2016, 291: 21971-21983, PMID: 27573238) that diazoxide, which stabilizes K<sub>ATP</sub> channels in an open conformation, also reduces physical association between Kir6.2 N-terminus and SUR1 as demonstrated by reduced crosslinking of engineered azido-phenylalanine (an unnatural amino acid) at Kir6.2 N-terminal amino acid 12 position to SUR1. Incubating cells with diazoxide did not rescue the trafficking mutants but actually further reduced the maturation efficiency of trafficking mutants. For this reason, we did not include diazoxide during Aekatperone incubation and instead added diazoxide after Aekatperone washout to potentiate the activity of mutant channels rescued to the cell surface. In vivo, we envision testing alternating Aekatperone and diazoxide dosing to maximize functional rescue of K<sub>ATP</sub> trafficking mutants.

      (4) Do the authors have any information about the turnover time of KATP after washoff of the chaperone (how stable are the rescued channels at the cell surface)? This is a difficult question to probe when glibenclamide is used as a chaperone, but maybe much simpler to address with a lower affinity chaperone like AKP.

      Thank you for your thoughtful comment. While we have not yet tested the duration of rescued K<sub>ATP</sub> channels at the cell surface following Aekatperone washout, we have conducted similar studies with carbamazepine (Chen PC et al. Carbamazepine as a novel small molecule corrector of trafficking-impaired ATP-sensitive potassium channels identified in congenital hyperinsulinism. J Biol Chem. 2013, 288: 20942-20954, PMID: 23744072), another compound exhibiting reversible inhibitory and chaperone effects (apparent affinity between glibenclamide and Aekatperone). Our previous findings with carbamazepine showed that in cultured cells its chaperone effects were detectable as early as 1 hour and peaked around 6 hours after treatment. Furthermore, when carbamazepine was removed following a 16-hour treatment, the rescue effect persisted for up to 6 hours post-drug removal. These results provide a potential duration of the surface expression rescue effects of reversible pharmacochaperones.

      Reviewer #1 (Recommendations for the authors):

      The paper is well-written and comprehensive with only very minor essentially copy-editing needed. That said, it would be good if the authors could answer the main points raised above:

      (1) What is the relevant Tanimoto parameters and sequence identity (does this mean structural identity) for the identified compounds?

      As we answered above in response to the overall assessment, to facilitate the identification of novel hits, molecules with greater than 0.5 Tanimoto similarity in ECFP4 space to any known binders of the target protein and its homologs within 70% amino acid sequence identity were excluded from the commercial library. Additionally, after scoring and ranking the molecules by the AtomNet® technology, a diversity clustering was performed on the top 30,000 molecules using the Butina algorithm with a Tanimoto similarity cutoff of 0.35 in ECFP4 space to minimize selection of structurally similar scaffolds for the final compound buy-list.

      (2) Did any of the identified putative binders have inhibitory action on channels? If so, does this give clues to separating chaperoning from inhibitory effects?

      Please see response to the same question in the overall assessment above.

      (3) Acknowledge that the identified compounds contain sulfonylurea-like moieties, and address why Aekatperone should (or perhaps does not) offer anything advantage over low affinity sulfonrylureas such as tolbutamide?

      Please see response to the same question in the overall assessment above.

      Reviewer #2 (Recommendations for the authors):

      Thank you for assembling the interesting study, which I felt was well designed and communicated. The diverse approaches used in the study, with consistent findings, were definitely a strength. The core findings are also well distilled in the main body of the text, and although there is quite a lot of supplementary information, I felt that it was presented appropriately and well selected in terms of what would be important for readers hoping to learn more. In addition to the questions described above, I only had a few minor editorial issues that could be fixed related to presentation.

      (1) Figure 1B. The colours and resolution of the chemical structures are difficult to see clearly and could be improved.

      We have revised the figure accordingly.

      (2) This is a minor wording point... first sentence of the discussion describes the drugs as pancreatic-selective, when it would be more clear to describe them as selective for the pancreatic isoform of KATP (Kir6.2/SUR1), or perhaps better as 'exhibiting ~4-5 fold selective for SUR1-containing KATP channels vs. SUR2A or SUR2B'.

      We have changed the wording as suggested.

      (3) As a curiosity (not necessary to do more experiments), but I am curious if the authors know whether there is any meaningful enhancement of trafficking of WT channels by AKP.

      All pharmacochaperones we have identified to date including Aekatperone also slightly enhance WT channel surface expression (10-20%).

      Reviewing editor recommendations:

      (1) Given the modest resolution of the EM reconstruction, it is perhaps not entirely clear how AKP was assigned to the density observed. Specifically, it would be helpful to include a comparison of an AKP-free map and the current AKP map (filtered to a similar resolution) showing slice views of densities in the region around the inferred binding site. This would be very helpful in ascertaining whether the cryoEM reconstruction is an independent validation of the computational and functional experiments or whether the density inference depends on the additional knowledge.

      We appreciate the editor’s suggestion. We have now added a Supplemental Figure (Supplementary Figure 7 in the revised manuscript) that compares our AKP-free cryoEM density deposited previously to the EMDB (EMD-26320) and the AKP-bound cryoEM density from this study, with cryoEM density (filtered to the same resolution) superimposed on the structural model.

      (2) It could help to mention in brief what is a probable mechanism of AKP inhibition - that is how after binding of AKP, channel opening is restricted. Is it similar to that of other site A ligands?

      Based on the strong Kir6.2 N-terminal cryoEM density observed in our AKP map, AKP most likely inhibits K<sub>ATP</sub> channels by trapping the Kir6.2 N-terminus in the central cavity of SUR1’s ABC core thus preventing Kir6.2-C-terminal domain from rotating to an open conformation, similar to other ligands that stabilize the Kir6.2 N-terminus-SUR1 interface by binding to site A (such as tolbutamide and AKP), site B (such as repaglinide), or both site A and site B (such as glibenclamide). We have now included this in the revised Results and Discussion sections.

      (3) In the context of the MD simulations, do other site A ligands (which from my understanding bind at a similar site) also exhibit similar flexibility as AKP? If there is information available on the flexibility of ligands of varying affinities, bound to the same site, maybe some correlative inferences can be drawn? However, in MD simulation trajectories it is not entirely uncommon for a ligand to simply get trapped in a local energy well. Since the authors have performed significant analysis of their MD results it could be worth mentioning/discussing such phenomena.

      Previously published MD data addressing ligand dynamics, such as glibenclamide in the SUR1 pocket (Walczewska-Szewc K, Nowak W. Photo-Switchable Sulfonylureas Binding to ATPSensitive Potassium Channel Reveal the Mechanism of Light-Controlled Insulin Release. J Phys Chem B. 2021, 125: 13111-13121, PMID: 34825567), indicate a certain degree of flexibility. Unfortunately, we cannot directly compare these results, as the simulations were performed without the KNtp domain in the SUR1 cavity, which partially contributes to ligand stabilization. This is an issue we plan to investigate in the future.

      In this study, we ran five independent MD simulations, each 500 ns long, resulting in a total of 2.5 μs of simulation time. Across all replicates, the ligand stayed in the same position, with variations mainly in the dynamics of the blurred segment. Considering the length of the simulations and the consistency across the runs, we believe this binding pose is stable and represents a global (or at least highly stable) energy minimum, consistent with the cryo-EM data.

      (4) In electrophysiological assays, 10 uM AKP seems to inhibit all currents (Figure 2), but in the Rb+ flux assay ~10 uM appears to be the IC50. The reason for this difference is not entirely clear and it would help to comment on this.

      Thank you for noticing the difference. The initial electrophysiological experiments were conducted using the very small amount of AKP provided to us from Atomwise. We estimated the concentration of the reconstituted AKP the best we could, but the concentration was likely to not be very accurate due to difficulty in handling the very small amount of the AKP powder. Subsequent Rb<sup>+>/sup> efflux experiments were conducted using a different, larger batch of AKP we purchased from Enamine. We have now stated this in the Methods section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      As reported above, this paper by Xu et al reports on a new method to combine the analysis of coevolutionary patterns with dynamic profiles to identify functionally important residues and reveal correlations between binding sites.

      Strengths:

      In general, coevolutionary analysis and MD analysis are carried out separately and while there have been attempts to compare the information provided by the two, no unified framework exists. Here, the authors convincingly demonstrate that integrating signals from Dynamics and coevolution gives information that substantially overcomes the one provided by either method in isolation. While other methods are useful, they do not capture how dynamics is fundamental to define function and thus sculpts coevolution, via the 3D structure of the protein. At the same time, the authors demonstrate how coevolution in turn also influences internal dynamics. The Networks they rebuild unveil information at an even higher level: the model starts pairwise but through network representation the authors arrive to community analysis, reporting on interaction patterns that are larger than simple couples.

      Weaknesses:

      The authors should

      - Make an effort in suggesting/commenting the limits of applicability of their method;

      We have added a sentence on Page 17, line 15 that describes the limitation of our method.

      - Expand discussion on how DyNoPy compares to other methods;

      A paragraph has been added to explain the comparison with other models (Page 3, line 18)

      - Dynamic is not essential in all systems (structural proteins): The authors may want to comment on possible strategies they would use for other systems where their framework may not be suitable/applicable.

      We agree with the reviewer that dynamics is not essential in all systems. In systems where there is limited role of dynamics in the function, the analysis done with DyNoPy is equivalent to conventional coevolution analysis, which can be consider one limitation of our method. Conversely, for dynamic proteins, combining functional dynamics descriptors with coevolution analysis using DyNoPy, helps in denoising information by deconvolution of communities. We have included this in the manuscript to highlight the suitability/applicability of the method.

      Further, we have added a paragraph in the Introduction and conclusions highlighting the main difference between DyNoPy and existing computational tools like DCCM, KIN, and SPM and for your convenience it is provided below:

      “Functional sites are often regulated by both, local and global interactions. Changes in these interactions are instrumental for functional events like substrate binding, catalysis, and conformational changes (18). The development of physical models of protein dynamics and the increase in available computational power has stimulated the adoption of computational techniques (19, 20) to investigate the conformational dynamics of proteins, an essential component of the many biological functions (21, 22). Different models have been proposed to describe the interactions between residues during simulations and network models have been particularly popular,  including methods on single structures and MD simulations data built by analysing the response to external forces on residue networks (23), by estimating the prevalence of non-covalent energy interaction networks in homologous proteins (24), or by analysing linear or non-linear correlation in atomic fluctuations (25, 26). These techniques have demonstrated their usefulness in extracting allosteric networks from structural data with applications in enzyme design (26).”

      Reviewer #2 (Public review):

      Summary:

      Authors introduced a computational framework, DyNoPy, that integrates residue coevolution analysis with molecular dynamics (MD) simulations to identify functionally important residues in proteins. DyNoPy identifies key residues and residue-residue coupling to generate an interaction graph and attempts to validate using two clinically relevant β-lactamases (SHV-1 and PDC-3).

      Strengths:

      DyNoPy could not only show clinically relevance of mutations but also predict new potential evolutionary mutations. Authors have provided biologically relevant insights into protein dynamics which can have potential applications in drug discovery and understanding molecular evolution.

      Weaknesses:

      Although DyNoPy could show the relevance of key residues in active and non-active site residues, no experiments have been performed to validate their predictions.

      We thank the reviewer for highlighting this point. We acknowledge that direct experimental validation of our predictions for DyNoPy has not yet been performed. However, we have provided explanations and evidence from experiments conducted on closely related homologs to support the relevance of key residues. These homologs share significant structural and functional similarity, which strengthens the reliability of our predictions.

      In addition, they should compare their method with conventional techniques and show how their method could be different.

      We thank all the reviewers for highlighting this oversight on our behalf. In Introduction and conclusion, we have added the following paragraphs:

      “Functional sites are often regulated by both, local and global interactions. Changes in these interactions are instrumental for functional events like substrate binding, catalysis, and conformational changes (18). The development of physical models of protein dynamics and the increase in available computational power has stimulated the adoption of computational techniques (19, 20) to investigate the conformational dynamics of proteins, an essential component of the many biological functions (21, 22). Different models have been proposed to describe the interactions between residues during simulations and network models have been particularly popular,  including methods on single structures and MD simulations data built by analysing the response to external forces on residue networks (23), by estimating the prevalence of non-covalent energy interaction networks in homologous proteins (24), or by analysing linear or non-linear correlation in atomic fluctuations (25, 26). These techniques have demonstrated their usefulness in extracting allosteric networks from structural data with applications in enzyme design (26). ”

      An explanation of "communities" divided in the work and how these communities are relevant to the article should be provided. In addition, choice of collective variables and their relevance in residue coupling movement is also not very well explained. Dynamics cross correlation map can also be a good method for understanding the residue movements and can explain the residue-residue coupling, it is not explained how DyNoPy is different from the conventional methods or can perform better.

      The following sentences have been included in the manuscript to address the questions raised by the reviewer:

      On Community Definition and relevance

      DyNoPy identified coevolving residue pairs (scaled coevolution score >1) with interactions strongly correlated with protein functional motions (i.e., J values larger than zero). Applying network analysis on the combined dynamics-coevolution matrix helps us extracting higher-order interactions beyond pairwise coupling and detecting critical residues, which show multiple interactions with each other. Moreover, indirect long-range relationships, which would be hard to identify from numerical data, could be detected through community clustering. Community-based analysis offers a more comprehensive understanding of residue relationships and enables the visualization of residue couplings on the protein structure.

      On Choice of collective variables:

      DyNoPy works on the assumption that time-dependent interactions between critical residues, either having significant structural change or not will correlate with functional conformational motions. Since MD simulation data is high-dimensional, a time-dependent dynamic descriptor is required to extract the most relevant information for the process under study. A good collective variable (CV) should appropriately describe protein functional motions. Thus, a CV that detects the highest number of residue couplings is expected to be the most suitable descriptor (Mentioned in Page 22 Line 14). In our study, we tested 12 CVs, either focusing on the entire protein or on selected regions. And the best performed CV (the one identified the most residue couplings) was selected for further analysis. In practical applications, users can decide whether to focus on the most relevant global or local dynamics descriptor  depending on the dynamics of their specific system.

      We have added a paragraph in the Introduction differentiating DyNoPy with other methods including DCCM. DCCM differs from DyNoPy in two aspects 1) it does not account for inter-residue coevolution 2) the correlation matrix captures correlations of atomic/residue movements associated with the whole intrinsic dynamics of the system, without filtering for the contributions to the important motions involved in the biological function. Additionally, any residue pair contributing to functional motion without itself undergoing any structural change will not be visible in this approach.

      In the sentence "DyNoPy identified eight significant communities of strongly coupled residues within SHV-1 (Supporting Fig. S4A)" I could not find a clear description of eight significant communities.

      The following sentences have been included in the results, methods and figure legends that define ‘significant community’:

      ‘DyNoPy identified eight meaningful communities, each consisting of at least three strongly coupled residues within SHV-1 (Supplementary Fig. S4A). All crucial catalytic residues and critical substitution sites previously mentioned participating in one of these communities with the exceptions of R<sub>43</sub>, R<sub>202</sub>, and S<sub>130</sub>.’ (Page 8 Line 28)

      ‘A meaningful community should contain at least three residues.’ (Page 21 Line 2)

      ‘A reasonable residue community should contain at least three residues.’ (SI Page 11)

      Again the description of communities is not clear to me in the following sentence "Detailed description of the other three communities is provided in the supporting information (Fig. S6)."

      This following sentence has been rewritten.

      ‘Detailed description of communities with secondary importance for protein function (community 3, 8, and 9) is provided in the supplementary information (Supplementary Fig. S6).’ (Page 9, line 8)

      In the sentence "N170 acts as an intermediary between N136 and E166". Kindly cite the reference figure to show N179 as intermediate residue.

      This sentence has been rewritten to avoid any confusion.

      ‘Although DyNoPy did not detect this direct interaction between N136 and E166, the established relationship between N136 and N170 highlights the role of N136 in influencing E166.’ (Page 10 Line 8)

      Please be careful with the numbers. In the sentence "These residues not only interact with each other directly but are also indirectly coupled via 21 other residues." I could count 22 other residues and not 21.

      We thank the reviewer for spotting this error. This has now been corrected. All the communities are counted again.

      ‘These residues not only interact with each other directly but are also indirectly coupled via 22 other residues.’ (Page 12 Line 14)

      In the sentence "Unlike other substitution sites that are adjacent to the active site, R<sub>205</sub> is situated more than 16 Å away from catalytic serine S<sub>70</sub>". Please add this label somewhere in the figure.

      The figure legends have been updated to include this. Distances have been added to community 4 Fig. 3 and community 6 Fig. 4. Residue index in the legend of Fig.3 has been included as subscript. Distance in the main text has been changed to be more accurate.

      ‘G<sub>156</sub> and A<sub>146</sub> are two functional important residues distant from the active site. G<sub>156</sub> is 21.3Å away from the catalytic S<sub>70</sub>. A<sub>146</sub> is 16.8Å away from S<sub>70</sub>.’ (Page 12 Line 2)

      ‘R<sub>205</sub> is a functional important residue that is 20.6Å away from the active site S<sub>70</sub>.’ (Page 13 Line 10)

      Please cite a reference in the sentence "This indicates that mutations on G238 would result in an alteration on protein catalytic function, as well as an increased flexibility of the protein, which strongly aligns with previous finding."

      The citation has been added

      ‘This indicates that mutations on G238 would result in an alteration on protein catalytic function, as well as an increased flexibility of the protein, which strongly aligns with previous finding (62).’ (Page 15 Line 2)

      Reviewer #3 (Public review):

      Summary:

      In this paper, Xu, Dantu and coworkers report a protocol for analyzing coevolutionary and dynamical information to identify a subset of communities that capture functionally relevant sites in beta-lactamases.

      Strengths:

      The combination of coevolutionary information and metrics from MD simulations is interesting for capturing functionally relevant sites, which can have implications in the fields of drug discovery but also in protein design.

      Weaknesses:

      The combination of coevolutionary information and metrics from MD simulations is not new as other protocols have been proposed along the years (the current version of the paper neglects some of them, see below), and there are a few parameters of the protocol that, in my opinion, should be better analyzed and discussed.

      (1) As mentioned, the introduction of the paper lacks some important publications in the field of using graph theory to represent important interaction networks extracted from MD simulations (DOI: 10.1002/pro.4911), and also combining MD data with MSA to identify functionally relevant sites for enzyme design (doi: 10.1021/acscatal.4c04587, 10.1093/protein/gzae005).

      We are very grateful for pointing us to these references. We have added a paragraph in the Introduction mentioning these and other computational tools similar to DyNoPy. Further, in conclusion we have highlighted the differences between DyNoPy and existing tools.

      (2) The matrix used to apply graph theory (J_ij) is built from summing the scaled coevolution and degree of correlation values. The alpha and beta weights are defined, and the authors mention that alpha is set to 0.5, thus beta as well to fulfil with the alpha + beta = 1. Why a value of 0.5 has been selected? How this affects the overall results and conclusions extracted? The finding that many catalytically relevant residues are identified in the communities is not surprising given that such sites usually present a high conservation score.

      This is an excellent question. Our present formulation allows the user to easily assess the influence of coevolution and dynamic couplings on the output. Setting alpha to 0.5, weights both evolutionary and dynamics information equally and has shown promising results in SHV-1 and PDC-3. As it has been presented in the manuscript, setting alpha to 1, i.e., purely utilising coevolution data does not let us identify critical residues effectively as all residues are included in the set (Supplementary Fig. S4 and S5). In future work, we would like to investigate the effect of scanning alpha from 0 to 1 on the final residue list, possibly on a larger set of proteins and protein families.

      We would also like to point out that some of the residue pairs with coevolution scores in the top 1% have J-scores set to 0, as they lacked significant coupling to the functional dynamics.

      (3) Another important point that needs further explanation is the selection of the relevant descriptor of protein dynamics. In this study two different strategies have been used (one more global the other more local), but more details should be provided regarding their choice. What is the best strategy according to the authors? Why not using the same strategy for both related systems? The obtained results using one methodology or the other will have a large impact on the dynamical score. Another related point is: what is the impact of the MD simulation length, how the MSA is generated and number of sequences used for MSA construction?

      As in the case of many complex proteins, the flow of information occurs in β-lactamases via structural interactions (https://doi.org/10.7554/eLife.66567). These interactions occur both on a local level, as in the case of binding site residues or residues immediately surrounding the binding site; however, there are interactions far away (>20Å) from the binding site that have the ability to alter function. We have obtained this information from extensive surveys of clinical isolates and experimental data. To account for such interactions, a more global approach has to be taken. To answer the reviewer’s question: each system is unique and there is no one-fixed strategy. In short, the method used should be able to denoise information and the user is advised to fine-tune their findings by corroborating with experimental and clinical information.

      The length of MD simulations is also system specific. Some systems effectively sample the functional dynamics within a shorter simulation time, while others take a long timescale MD simulation to converge. The results won’t change as long as the simulation has effectively sampled the functional dynamics associated with biological function.

      The MSA is generated by the HH-Suite package as mentioned on Page 19 Line 19. More specifically, the MSA is constructed based on the UniRef30 database, where sequences are clustered, and each cluster contains sequences with at least 30% sequence identity. This provides a non-redundant set of protein sequences. Our package allows the automatic generation of MSAs from the database. For SHV-1, the alignment contains 18,175 protein sequences and for PDC-3, the alignment consists of 27,892 protein sequences. Full details of this protocol are published in Bibik et al. (https://doi.org/10.1093/bioinformatics/btae166). We have revised the methods section to include these details.

      Other Minor Alterations

      ‘Fig. S1 and S2’ has been changed to ‘Supplementary Fig. S1 and S2’ for consistency (Page 6 Line 12)

      (1) ‘Figure 5B’ has been changed to ‘Fig. 5B’ for consistency (Page 16 Line 11)

      (2) All the ‘Figure’ has been changed to ‘Fig.’ in the SI for consistency

      (3) Just as the suggestion, an alteration has been made on the Step 1 of Fig.1.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary:

      In this manuscript, Hammond et al. study robustness of the vertebrate segmentation clock against morphogenetic processes such as cell ingression, cell movement and cell division to ask whether the segmentation clock and morphogenesis are modular or not. The modularity of these two would be important for evolvability of the segmenting system. The authors adopt a previously proposed 3D model of the presomitic mesoderm (Uriu et al. 2021 eLife) and include new elements; different types of cell ingression, tissue compaction and cell cycles. Based on the results of numerical simulations that synchrony of the segmentation clock is robust, the authors conclude that there is a modularity in the segmentation clock and morphogenetic processes. The presented results support the conclusion. The manuscript is clearly written. I have several comments that could help the authors further strengthen their arguments.

      Major comment: 

      [Optional] In both the current model and Uriu et al. 2021, coupling delay in phase oscillator model is not considered. Given that several previous studies (e.g. Lewis 2003, Herrgen et al. 2010, Yoshioka-Kobayashi et al. 2020) suggested the presence of coupling delays in DeltaNotch signaling, could the authors analyze the effect of coupling delay on robustness of the segmentation clock against morphogenetic processes?

      We thank the reviewer for the suggestion. Owing to the computational demands of including such a delay in the model, we cannot feasibly repeat every simulation analysed here in the presence of delay, and would like to note that the increased computational demand that delays put on the simulations is also the reason why Uriu et al 2021 did not include it, as stated in their published exchange with reviewers. However, analogous to our analysis in figure 7, we can analyse how varying the position of progenitor cell ingression affects synchrony in the presence of the coupling delay measured in zebrafish by Herrgen et al. (2010). We show this analysis in a new figure 8 (8B, specifically), on page 21, and discuss its implications in the text on pages 2022. Our analysis reveals that the model cannot recover synchrony using the default parameters used by Uriu et al. (2021) and reveal a much stronger dependence on the rate of cell mixing (vs) than shown in the instantaneous coupling case (cf. figure 7). However, by systematically varying the value of the delay we find that a relatively minor increase in the delay is sufficient to recover synchrony using the parameter set of Uriu et al. (see figure 8C). Repeating this across the three scenarios of cell ingression we see that the combination of coupling strength and delay determine the robustness of synchrony to varying position of cell ingression. This suggests that the combination of these two parameters constrain the evolution of morphogenesis.

      Minor comments: 

      -  PSM radius and oscillation synchrony are both denoted by the same alphabet r. The authors should use different alphabets for these two to avoid confusion.

      We thank the reviewer for spotting this. This has now been changed throughout to rT, as shorthand for ‘radius of tissue’.

      -  page 5 Figure 1 caption: (x-x_a/L) should be (x-x_a)/L.

      We thank the reviewer for spotting this. This has now been corrected.

      -  Figure 3C: Description of black crosses in the panels is required in the figure legend.

      Thank you for spotting this. The legend has now been corrected.

      -  Figure 3C another comment: In this panel, synchrony r at the anterior PSM is shown. It is true that synchrony at anterior PSM is most relevant for normal segment formation. However, in this case, the mobility profile is changed, so it may be appropriate to show how synchrony at mid and posterior PSM would depend on changes in mobility profile. Is synchrony improved by cell mobility at the region where cell ingression happens?

      We thank the reviewer for the suggestion. We have now plotted the synchrony along the AP axis for varying motility profiles, and this can be seen in figure 3 supplement 1, and is briefly discussed in the text on page 11. We show that while the synchrony varies with x-position (as already expected, see figure 2), there is no trend associated with the shape of the motility profile.

      -  In page 12, the authors state that "the results for the DP and DP+LV cases are exactly equal for L = 185 um, as .... and the two ingression methods are numerically equivalent in the model". I understood that in this case two ingression methods are equivalent, but I do not understand why the results are "exactly" equal, given the presence of stochasticity in the model.

      These results can be exactly equal despite the simulations being stochastic because they were both initialised using the same ‘seed’ in the source code. However, we now see that this might be confusing to the reader, and we have re-generated this figure but this time initialising the simulations for each ingression scenario using a different seed value. This is now reflected in the text on page 12 and in figure 4.

      -  The authors analyze the effect of cell density on oscillation synchrony in Fig. 4 and they mention that higher density increases robustness of the clock by increasing the average number of interacting neighbours. I think it would be helpful to plot the average number of neighbouring cells in simulations as a function of density to quantitatively support the claim.

      We thank the reviewer for their suggestion. Distributions of neighbour numbers for exemplar simulations with varying density can now be found in  figure 4 supplementary figure 1 and are referred to in the text on page 11.

      -  The authors analyze the effect of PSM length on synchrony in Fig. 4. I think kymographs of synchrony r as shown in Fig. 2D would also be helpful to show that indeed cells get synchronized while advecting through a longer PSM.

      We thank the reviewer for their suggestion and agree that visualising the data in this way is an excellent idea. We have generated the suggested kymographs and added them to figure 4 as supplements 2 and 4, and discussed these results in the text on page 12.

      -  I understand that cells in M phase can interact with neighboring cells with the same coupling strength kappa in the model, although their clocks are arrested. If so, this aspect should be also mentioned in the main text in page 16, as this coupling can be another noise source for synchrony.

      We agree this is an important clarification. We explicitly state this, and briefly justify our choice, in the text on page 16.

      -  Figure 5-figure supplement 2: panel labels A, B, C are missing. 

      Thank you for bringing this to our attention. These have now been added.

      – Figure 5-figure supplement 3: panel labels A, B, C are missing.

      Thank you for bringing this to our attention. These have now been added.

      Reviewer #1 (Significance):

      Synchronization of the segmentation clock has been studied by mathematical modeling, but most previous studies considered cells in a static tissue without morphogenesis. In the previous study by Uriu et al. 2021, morphogenetic processes such as cell advection due to tissue elongation, tissue shortening, and cell mobility were considered in synchronization. The current manuscript provides methodological advances in this aspect by newly including cell ingression, tissue compaction and cell cycle. In addition, the authors bring a concept of modularity and evolvability to the field of the vertebrate segmentation clock, which is new. On the other hand, the manuscript confirms that the synchronization of the segmentation clock is robust by careful simulations, but it does not propose or reveal new mechanisms for making it robust or modular. The main targets of the manuscript will be researchers working on somitogenesis and evolutionary biologists who are interested in evolution of developmental systems. The manuscript will also be interested by broader audiences, like developmental biologists, biophysicists, and physicists and computer scientists who are working on dynamical systems.

      We thank the reviewer for their interest in our manuscript and for acknowledging us as one of the first to address the modularity and evolvability of somitogenesis. We hope that this work will encourage others to think about these concepts in this system too.  

      In the original submission, we identified a high enough coupling strength as the main mechanism underlying the identified modularity in somitogenesis. Since, we have included an analysis of the coupling delay and find that it is the interplay between coupling strength and coupling delay that mediate the identified modularity, allowing PSM morphogenesis and the segmentation clock to evolve independently in regions of parameter space that are constrained and determined by the interplay between these two parameters. We have now added an extra figure (figure 8) where we explore this interplay and have discussed it at length in the last section of the results and in the discussion. We again thank the reviewer for encouraging us to include delays in our analysis.

      Reviewer #2 (Evidence, reproducibility and clarity):

      SUMMARY 

      The manuscript from Hammond et al., investigates the modularity of the segmentation clock and morphogenesis in early vertebrate development, focusing on how these processes might independently evolve to influence the diversity of segment numbers across vertebrates.

      Methodology: The study uses a previously published computational model, parameterized for zebrafish, to simulate and analyse the interactions between the segmentation clock and the morphogenesis of the pre-somitic mesoderm (PSM). Their model integrates cell advection, motility, compaction, cell division, and the synchronization of the embryo clock. Three alternative scenarios of PSM morphogenesis were modeled to examine how these changes affect the segmentation clock.

      Model System: The computational model system combines a representation of cell movements and the phase oscillator dynamics of the segmentation clock within a three-dimensional horseshoe-shaped domain mimicking the geometry of the vertebrate embryo PSM. The parameters used for the mathematical model are mostly estimated from previously published experimental findings.

      Key Findings and Conclusions: (1) The segmentation clock was found to be broadly robust against variations in morphogenetic processes such as cell ingression and motility; (2) Changes in the length of the PSM and the strength of phase coupling within the clock significantly influenced the system's robustness; (3) The authors conclude that the segmentation clock and PSM morphogenesis exhibited developmental modularity (i.e. relative independence), allowing these two phenomena to evolve independently, and therefore possibly contributing to the diverse segment numbers observed in vertebrates.

      MAJOR COMMENTS

      (1) The key conclusion drawn by the authors (that there is robustness, and therefore modularity, between the morphogenetic cellular processes modeled and the embryo clock synchronization) stems directly from the modeling results appropriately presented and discussed in the manuscript. The model comprises some strong assumptions, however all have been clearly explained and the parameterization choices are supported by experimental findings, providing biological meaning to the model. Estimated parameters are well explained and seem reasonable assumptions (from the embryology perspective).

      We thank the reviewer for their positive comments about our work

      (2) This study, as is, achieves its proposed goal of evaluating the potential robustness of the embryo clock to changes in (some) morphogenetic processes. The authors do not claim that the model used is complete, and they properly identify some limitations, including the lack of cellcell interactions. Given the recognized importance of cellular physical interactions for successful embryo development, including them in the model would be a significant addition in future studies.

      We would like to clarify that the model does include cell-cell interactions as cells interact with their neighbours’ clock phase to synchronise and to avoid occupying the same physical space. 

      (3) The authors have deposited all the code used for analysis in a public GitHub repository that is updated and available for the research community.

      We support open source coding practices.

      (4) In page 6, the authors justify their choice of clock parameters for cells ingressing the PSM: "As ingressing cells do not appear to express segmentation clock genes (Mara et al. (2007)), the position at which cells ingress into the PSM can create challenges for clock patterning, as only in the 'off' phase of the clock will ingressing cells be in-phase with their neighbours."  However, there are several lines of evidence (in chick and mouse), that some oscillatory clock genes are already being expressed as early as in the gastrulation phase (so prior to PSM ingression) (Feitas et al, 2001 [10.1242/dev.128.24.5139]; Jouve et al, 2002 [10.1242/dev.129.5.1107]; Maia-Fernandes at al, 2024 [10.1371/journal.pone.0297853]) Question: Is this also true in zebrafish? (I.e. is there any recent experimental evidence that the clock genes are not expressed at ingression, since the paper cited to support this assumption is from 2007). If they are expressed in zebrafish (as they are in mouse and chick), then the cell addition should have random clock gene periods when they enter the PSM and not start all with a constant initial phase of zero. Probably this will not impact the results since the cells will also be out of phase with their neighbours when they "ingress", however, it will model more closely the biological scenario (and avoid such criticism).

      We thank the reviewer for their comments. While it is known that in zebrafish the clock begins oscillating during epiboly and before the onset of segmentation (Riedel-Kruse et al., 2007), to our knowledge no-one has examined whether posteriorly or laterally ingressing progenitor cells express clock genes prior to their ingression into the PSM, which occurs later in development than the first oscillations which give rise to the first somites. We have not found any published evidence of her/hes gene expression in the dorsal donor tissues or lateral tissues surrounding the PSM, however we acknowledge that this has not been actively studied before and our assumption relies on an absence of evidence, rather than evidence of absence. 

      However, we agree with the reviewer that one should include such an analysis for completeness, and we have now generated additional simulations where progenitor cells ingress with a random clock phase. This data is presented in figure 2 supplement 1 and mentioned in the main text on page 9.

      MINOR COMMENTS 

      (1) The citations are appropriate and cover the major labs that have published work related to this study (although with some overrepresentation of the lab that published the model used).

      We have cited the vast literature on somitogenesis to the best of our ability and do recognise that the work of the Oates lab appears prominently, but this is probably because their experimental data were originally used to parametrise the model in Uriu et al. 2021.

      (2) The text is clear, carefully written, and both the methods and the reasoning behind them are clearly explained and supported by proper citations.

      We are very glad to see that the reviewer found that the manuscript was clearly presented.

      (3) The figures are comprehensive, properly annotated, with explanatory self-contained legends. I have no comments regarding the presentation of the results.

      Thank you

      (4) Minor suggestions: 

      a. Page 26: In the Cell addition sub-section of the Methods section, correct all instances where the word domain is used, but subdomain should be used (for clarity and coherence with the description of the model, stated as having a single domain comprising 3 subdomains).

      We thank the reviewer for raising this, this is a good point. We have now corrected to ‘subdomain’ where appropriate.

      b. Page 32: Table 1. Parameter values used in our work, unless otherwise stated -> Suggestion: Add a column with the individual citations used for each parameter (to facilitate the confirmation of each corresponding reference).

      Thank you for the suggstion, we have now done this (see table 1 page 36).

      Reviewer #2 (Significance):

      GENERAL ASSESSMENT 

      This study uses a previously published model to simulate alternative scenarios of morphogenetic parameters to infer the potential independence (termed here modularity) between the segmentation clock and a set of morphogenetic processes, arguing that such modularity could allow the evolution of more flexible body plans, therefore partially explaining the variability in the number of segments observed in the vertebrates. This question is fundamental and relevant, yet still poorly researched. This work provides a comprehensive simulation with a model that tries to simplify the many morphogenetic processes described in the literature, reducing it to a few core fundamental processes that allow drawing the conclusions seeked. It provides theoretical insight to support a conceptual advance in the field of evolutionary vertebrate embryology.

      ADVANCE

      This study builds on a model recently published by Uriu et al. (eLife, 2021) that incorporates quantitative experimental data within a modeling framework including cell and tissue-level parameters, allowing the study of multiscale phenomena active during zebrafish embryo segmentation. Uriu's publication reports many relevant and often non-intuitive insights uncovered by the model, most notably the description of phase vortices formed by the synchronizing genetic oscillators interfering with the traveling-wave front pattern.  However, this model can be further explored to ask additional questions beyond those described in the original paper. A good example is the present study, which uses this mathematical framework to investigate the potential independence between two of the modeled processes, thereby extracting extra knowledge from it. Accordingly, the present study represents a step forward in the direction of using relevant theoretical frameworks to quantitatively explore the landscape of complex molecular hypotheses in silico, and with it shed some light on fundamental open questions or inform the design of future experiments in the lab.

      The study incorporates a wide range of existing literature on the developmental biology of vertebrates. It comprehensively cites prior work, such as the foundational studies by Cooke and Zeeman on the segmentation clock and the role of FGF signaling in PSM development as discussed by Gomez et al. The literature properly covers the breadth of knowledge in this field.

      AUDIENCE

      Target audience | This study is relevant for fundamental research in developmental biology, specifically targeting researchers who focus on early embryo development and morphogenesis from both experimental and theoretical perspectives. It is also relevant for evolutionary biologists investigating the genetic factors that influence vertebrate evolution, as well as to computational biologists and bioinformatics researchers studying developmental processes and embryology.

      Developmental researchers studying the segmentation clock in other vertebrate model organisms (namely mouse and chick), will find this publication especially valuable since it provides insights that can help them formulate new hypotheses to elucidate the molecular mechanisms of the clock (for example finding a set of evolutionarily divergent genes that might interfere with PSM length). Additionally, this study provides a set of cellular parameters that have yet to be measured in mouse and chick, therefore guiding the design of future experiments to measure them, allowing the simulation of the same model with sets of parameters from different vertebrate model organisms, therefore testing the robustness of the findings reported for zebrafish.

      Reviewer #3 (Evidence, reproducibility and clarity): 

      In this manuscript, Verd and colleagues explored how various biologically relevant factors influence the robustness of clock dynamics synchronization among neighboring cells within the context of somatogenesis, adapting a mathematical model presented by Urio et. al in 2021 in a similar context. Specifically they show that clock dynamics is robust to different biological mechanisms such as cell infusion, cellular motility, compaction-extension and cell-division. On the other hand , the length of Presomitic Mesoderm (PSM) and density of cells in it has a significant role in the robustness of clock dynamics. While the manuscript is well-written and provides clear descriptions of methods and technical details, it tends to be somewhat lengthy.

      Below are the comments I would like the authors to address:

      (1) The authors mention that "...the model is three dimensional and so can quantitatively recapture the rates of cell mixing that we observe in the PSM". I am not convinced with this justification of using a 3D model. None of the effects the authors explore in this manuscript requires a three dimensional model or full physical description of the cellular mechanics such as excluded volume interaction etc. A one-dimensional model characterized by cell position along the arclength of PSM and somatic region and segmentation clock phase θ can incorporate all the physics authors described in this manuscript as well as significantly computationally cheap allowing the authors to explore the effect of different parameters in greater detail.

      One of the main objectives of the work we present in this manuscript is to assess how the evolution of PSM morphogenesis affects, or does not affect, segment patterning. The PSM is a three-dimensional tissue with differing cell rearrangement dynamics along its anterior-posterior axis. In addition, PSM dimension, density, the rearrangement rate, and patterns of cell ingression all vary across vertebrate species, and they are functional, especially cell mixing as it promotes synchronisation and drives elongation. In order to answer questions on the modularity of somitogenesis we therefore consider it absolutely necessary to include a three-dimensional representation of the PSM that captures single cells and their movements. In addition, this will allow us, as Reviewer #2 also pointed out, to reparametrize our model using species-specific data as it becomes available. 

      While the reviewer is right in that lower dimensional representations would be computationally more efficient, and are generally more tractable, it would not be possible to represent cell mixing in one dimension, as this happens in three dimensions. One could perhaps encode the synchrony-promoting effect of cell mixing via some coupling function κ(x) that increases towards the posterior, however it is unclear what existing biological data one could use to parameterise this function or determine its form. Cell mixing can be modelled in a two-dimensional framework, however this cannot quantitatively recapture the rate of cell mixing observed in vivo, which is an advantage of this model. 

      Furthermore, it is unclear how one would simulate processes such as compactionextension using a one-dimensional model. The two different scenarios of cell ingression which we consider can also not be replicated in a one-dimensional model, as having a population of cells re-acquiring synchrony on the dorsal surface of the tissue while new material is added to the ventral side, creating asynchrony, is qualitatively different than a one-dimensional scenario where cells are introduced continuously along the spatial axis.

      (2) I am not sure about the justification for limiting the quantification of phase synchrony in a very limited (one cell diameter wide) region at one end of the somatic part (Page 33 below Fig. 9). From my understanding of the manuscript, the segments appear in significant length anterior to this region. Wouldn't an ensemble average of multiple such one cell diameter wide regions in the somatic region be a more accurate metric for quantifying synchrony?

      Indeed, such a metric (e.g. as that used by Uriu et al. to quantify synchrony along the xaxis) would be more accurate for determining synchrony within the PSM. However, as per the clock and wavefront model of somitogenesis, only synchrony at the very anterior of the PSM (or at the wavefront, equivalently) is functional for somitogenesis and thus evolution. Therefore, we restrict our analysis to the anterior-most region of the PSM. We now further justify this in the main text on page 9.

      (3) While studying the effect of cellular ingression, the authors study three discrete modes- random, DP and DP+LV and show that in the DP+LV mode the clock synchrony becomes affected. I would like the authors to explore this in a continuous fashion from a pure DP ingression to Pure LV ingression and intermediates.

      We thank the reviewer for this suggestion; this is a very interesting question. We are currently working on a related computational and experimental project to address the question of how PSM morphogenesis can change over evolutionary time to evolve the different modes that we see across species. As part of this work, we are running precisely the simulations suggested by the reviewer to find regions of parameter space in which all the relevant morphogenetic processes can freely evolve.  While interesting, this work is however outside the scope of the current manuscript.

      (4) While studying the effect of length and density of cells in PSM on cellular synchrony, the authors restrict to 3 values of density and 6 values of PSM length keeping the other parameter constant. I would be interested to see a phase diagram similar to Fig. 7 in the two-dimensional parameter space of L and ρ0. I am curious if a scaling relation exists for the parameter values that partition the parameter space with and without synchrony.

      We thank the reviewer for their suggestion and agree that this would constitute an interesting addition to the manuscript. We have now generated these data, which are shown in figure 4 supplement 5 and mentioned on page 13. We see no clear relationship between these two variables when co-varying in the presence of random ingression. 

      (5) Both in the abstract and introduction, the authors discuss at a great length about the variability in the number of segments. I am curious how the number and width of the segments observed depend on different parameters related to cellular mechanics and the segmentation clock ?

      We thank the reviewer for this question. It was not clear to us if this was something the reviewer wants us to address in the study’s background and introduction, or an analysis we should include in the results. Therefore, we have responded to both comprehensively below:

      The prevailing conceptual framework for understanding this is the clock and wavefront model (Cooke and Zeeman, 1976), which posits that the somite length is inversely proportional to the frequency of the clock relative to the speed of the wavefront, and that the total number of segments is the relative frequency multiplied by the total duration of somitogenesis.

      Experimentally we know that the frequency is determined in part by the coupling strength (Liao, Jorg, and Oates, 2016), and from comparative embryological studies (Gomez et al., 2008; Steventon et al., 2016) we know that changes in the elongation dynamics of the PSM correlate with changes in somite number, presumably by altering the total duration of somitogenesis (Gomez et al., 2009). These changes in elongation are thought to be driven by the changes in cell and tissue mechanics we test in our manuscript. 

      Within our model, we cannot in general predict how the number of segments responds to changes in either clock parameters or cell mechanical parameters, as we lack understanding of what causes somitogenesis to cease; this is thus not encoded in our model and segmentation can in principle proceed indefinitely. Therefore, we have not performed this analysis.

      Similarly, we have not included an analysis of somite length. This is for two reasons: 1) as per the clock and wavefront model, the frequency at the PSM anterior (which we analyse) is equivalent to this measurement, as we assume (in general) the wavefront ($x = x_{a}$) is inertial. 2) the length of the nascent somite is not thought to be of much relevance to the adult phenotype, and by extension evolution. Somites undergo cell division and growth soon after their patterning by the segmentation clock, therefore their final size does not majorly depend on the dynamics of the segmentation clock. Rather, the main function of the clock is to control their number (and polarity).

      (6) The authors assume that the phase dynamics of the chemical network may be described by an oscillator with constant frequency. For the completeness of the manuscript, the author should discuss in detail, for which chemical networks this is a good assumption.

      We thank the reviewer for their suggestion and now justify this assumption in the methods on page 31. 

      Such an assumption is appropriate for the segmentation clock, as the clock in the posterior of the PSM is thought to oscillate with a constant frequency, at least for the majority of somitogenesis although the frequency of somite formation slows towards the end of this process in zebrafish (Giudicelli et al., 2007, PLoS Biol.). In addition, PSM cells isolated and cultured in the presence of FGF (thus replicating the signalling environment of the posterior PSM) will continue to exhibit her1 oscillations with an apparently constant frequency (Webb et al., 2016). 

      We note that such formulations are widely used within the segmentation clock literature (e.g. Riedel-Kruse et al., 2007, Morelli et al., 2009).

      (7) Figure 3 and the associated text shows no effect of the cellular motility profile in the synchrony of the segmentation clock. This may be moved to the supplementary considering the length of this manuscript.

      Thank you for the suggestion. However, we would argue that the lack of effect is a crucial result when discussing modularity. Reviewer #2 agrees with this assessment.

      Reviewer #3 (Significance): 

      The manuscript answers some important questions in the synchrony of segmentation clock in the vertebrates utilizing a model published earlier. However, the presented result is incomplete in some aspects (points 2 to 5 of section A) and that could be overcome by a more detailed analysis using a simpler one dimensional (point 1 of section A). I believe this manuscript could be of interest to an intersecting audience of developmental biologists, systems biologists, and physicists/engineers interested in dynamical systems.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      Farkas and colleagues conducted a comparative neuroimaging study with domestic dogs and humans to explore whether social perception in both species is underpinned by an analogous distinction between animate and inanimate entities an established functional organizing principle in the primate and human brain. Presenting domestic dogs and humans with clips of three animate classes (dogs, humans, cats) and one inanimate control (cars), the authors also set out to compare how dogs and humans perceive their own vs other species. Both research questions have been previously studied in dogs, but the authors used novel dynamic stimuli and added animate and inanimate classes, which have not been investigated before (i.e., cats and cars). Combining univariate and multivariate analysis approaches, they identified functionally analogous areas in the dog and human occipitotemporal cortex involved in the perception of animate entities, largely replicating previous observations. This further emphasizes a potentially shared functional organizing principle of social perception in the two species. The authors also describe between- species divergencies in the perception of the different animate classes, arguing for a less generalized perception of animate entities in dogs, but this conclusion is not convincingly supported by the applied analyses and reported findings.

      Strengths

      Domestic dogs represent a compelling model species to study the neural bases of social perception and potentially shared functional organizing principles with humans and primates. The field of comparative neuroimaging with dogs is still young, with a growing but still small number of studies, and the present study exemplifies the reproducibility of previous research. Using dynamic instead of static stimuli and adding new stimuli classes, Farkas and colleagues successfully replicated and expanded previous findings, adding to the growing body of evidence that social perception is underpinned by a shared functional organizing principle in the dog and human occipito-temporal cortex.

      Weaknesses

      The study design is imbalanced, with only one category of inanimate objects vs. three animate entities. Moreover, based on the example videos, it appears that the animate stimuli also differed in the complexity of the content from the car stimuli, with often multiple agents interacting or performing goal-directed actions. Moreover, while dogs are familiar with cars, they are definitely of lower relevance and interest to them than the animate stimuli. Thus, to a certain extent, the results might also reflect differences in attention towards/salience of the stimuli.

      We agree with the Reviewer and were aware that using only one class of inanimate objects but three classes of animate entities, along with the differences in complexity and relevance between the animate and the inanimate stimuli potentially elicited more attention to the inanimate condition and may have thus introduced a confound. We are revising the related limitation in the discussion to acknowledge this and to emphasize why we believe these differences do not compromise our main findings.

      The methods section and rationale behind the chosen approaches were often difficult to follow and lacked a lot of information, which makes it difficult to judge the evidence and the drawn conclusions, and it weakens the potential for reproducibility of this work. For example, for many preprocessing and analysis steps, parameters were missing or descriptions of the tools used, no information on anatomical masks and atlas used in humans was provided, and it is often not clear if the authors are referring to the univariate or multivariate analysis.

      We acknowledge the concerns regarding the clarity and completeness of the methods section and are significantly revising the descriptions of the methods. Of note, in humans, the Harvard-Oxford Cortical Structural Atlas (Frazier et al., 2005; Makris et al., 2006; Desikan et al., 2006; Goldstein et al., 2007), implemented within the FSL software package, was used for anatomical masks, while the Automated Anatomical Labeling atlas (Tzourio-Mazoyer et al., 2002) was used for assigning labels.

      In regard to the chosen approaches and rationale, the authors generally binarize a lot of rich information. Instead of directly testing potential differences in the neural representations of the different animate entities, they binarize dissimilarity maps for, e.g. animate entity > inanimate cars and then calculate the overlap between the maps.

      We thank the Reviewer for these comments and ideas. We also appreciate the second Reviewer for their related concerns and suggestions about the overlap calculation. Since the neural processing of different animate entities in the dog brain is largely unexplored, in some of our analyses we aimed to provide a straightforward and directly comparable characterization of animacy perception in the two species. We believe that a measure of how overlapping the neural representations of different animate classes are in the dog vs. the human visual cortex is a simple but meaningful and insightful characterization of how animacy perception is structured in the two species, despite the lack of spatial detail. Our decision to use binarization was based on these considerations. In response to this Reviewer’s request for providing richer information, in our revised manuscript, we will present more details and additional non-binarized calculations. Specifically, we are going to use nonbinarized data to present the response profiles of a broad, anatomically defined set of regions that have been related in other works to visual functions, to thus show where there is significant difference and overlap between the neural responses for the three animate classes in each species.

      The comparison of the overlap of these three maps between species is also problematic, considering that the human RSA was constricted to the occipital and temporal cortex (there is now information on how they defined it) vs. whole-brain in dogs.

      We thank this Reviewer for raising yet another relevant point about overlap calculation. We note that the overlap calculation for univariate results used the visually responsive cortex in both dogs and humans. The decision to restrict the multivariate analysis to the occipital and temporal lobes in humans, where the visual areas are, was to reduce computational load. Since RSA in dogs yielded significant voxels almost exclusively in the occipital and temporal cortices, we believe this decision did not introduce major bias in our results. This concern will also be discussed in our revised submission.

      Of note, in the category- and class-boundary test, as for the other multivariate tests, the occipital and temporal cortex of humans was delineated based on the MNI atlas.

      Considering that the stimuli do differ based on low-level visual properties (just not significantly within a run), the RSA would also allow the authors to directly test if some of the (dis)similarities might be driven by low-level visual features like they, e.g. did with the early visual cortex model. I do think RSA is generally an excellent choice to investigate the neural representation of animate (and inanimate) stimuli, but the authors should apply it more appropriately and use its full potential.

      We thank the Reviewer for this suggestion. While this study did not aim to investigate the correlation between low-level visual features and animacy, the data is available, and the suggested analysis can be conducted in the future. This issue will also be discussed in our revised submission.

      The authors localized some of the "animate areas" also with the early visual cortex model (e.g. ectomarginal gyrus, mid suprasylvian); in humans, it only included the known early visual cortex - what does this mean for the animate areas in dogs?

      We thank the Reviewer for raising this point. Although the labels are the same, both EMG and mSSG are relatively large gyri, and the clusters revealed by each of the two analyses hardly overlap, with peak coordinates more than 12 mm apart for R EMG, and in different hemispheres for mSSG (but more than 11 mm apart even if projected on the same hemisphere). We will detail the differences and the overlaps in the revised submission.

      The results section also lacks information and statistical evidence; for example, for the univariate region-of-interest (ROI) analysis (called response profiles) comparing activation strength towards each stimulus type, it is not reported if comparisons were significant or not, but the authors state they conducted t-tests. The authors describe that they created spheres on all peaks reported for the contrast animate > inanimate, but they only report results for the mid suprasylvian and occipital gyrus (e.g. caudal suprasylvian gyrus is missing).

      We thank this Reviewer for catching these errors. The missing statistics will be provided in the revised manuscript. Also, we mistakenly named the peak in caudal suprasylvian gyrus occipital gyrus on the figure depicting the response profiles. This will also be corrected.

      Furthermore, considering that the ROIs were chosen based on the contrast animate > inanimate stimuli, activation strength should only be compared between animate entities (i.e., dogs, humans, cats), while cars should not be reported (as this would be double dipping, after selecting voxels showing lower activation for that category).

      We thank both Reviewers for raising this relevant point about potential double dipping. The aim of this analysis was to describe the relationship between the neural response elicited by the three animate stimulus classes, to show that the animacy-sensitive peaks are not the results of the standalone greater response to a single animate class. We conducted t-tests only to assess significant difference between these three animate conditions and no stats were performed or reported for any animate class vs. inanimate comparisons in these ROIs. In addition to providing the missing t-tests (comparing animate classes), we will present response profiles and corresponding statistics for a broad set of additional, independent ROIs, defined either anatomically or functionally by other studies in the revised version.

      The descriptive data in Figure 3B (pending statistical evidence) suggests there were no strong differences in activation for the three species in dog and human animate areas. Thus, the ROI analysis appears to contradict findings from the binary analysis approach to investigate species preference, but the authors only discuss the results of the latter in support of their narrative for conspecific preference in dogs and do not discuss research from other labs investigating own-species preference.

      Studying conspecific-preference was not the primary aim of this study. We only used our data to characterize the animate-sensitive regions from this aspect. The species-preference test provides an overall characterization of the entire animate-sensitive region, revealing a higher number of voxels with a maximal response to conspecific than other stimuli in dogs (and a similar tendency in humans), confirming previous evidence on neural conspecific preference in visual areas in both species. The response profiles presented so far describe only the ROIs around the main animate-sensitive peaks and, as the Reviewer points out, in most cases reveal no significant conspecific bias. We believe there is no contradiction here: the entire animate-sensitive region may weakly but still be conspecific-preferring, whereas the main animate-sensitive peaks are not; the centers of conspecific preference may be located elsewhere in the visual cortex and may be supported by mechanisms other than animacy-sensitivity. In the revised manuscript, we will elaborate more on this. Additionally, in response to other comments, and for a better and more coherent characterization of species preference (and animacy sensitivity) across the visual cortex, we will present response profiles for other, independently defined regions and explore conspecific-sensitivity in those additional regions as well. Furthermore, we will discuss related own-species preference literature in greater detail.

      The authors also unnecessarily exaggerate novelty claims. Animate vs inanimate and own vs other species perceptions have both been investigated before in dogs (and humans), so any claims in that direction seem unsubstantiated - and also not needed, as novelty itself is not a sign of quality; what is novel, and a sign of theoretical advance besides the novelty, are as said the conceptual extension and replication of previous work.

      We agree with this Reviewer regarding novelty claims in general, and we confirm that we had no intention to overstate the uniqueness of our results. We also did not mean to imply that this work would be the first one on animacy perception in dogs, which it obviously is not. But we understand that we could have been more explicit presenting our work as a conceptual extension and replication of previous works, and we are revising the wording of the discussion from this aspect.

      Overall, more analyses and appropriate tests are needed to support the conclusions drawn by the authors, as well as a more comprehensive discussion of all findings.

      We are thankful for all comments. We will revise the methods section to provide sufficient detail and ensure replicability; conduct additional analyses as detailed above; and provide a more comprehensive discussion of all findings.

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports an fMRI study looking at whether there is animacy organization in a non-primate, mammal, the domestic dog, that is similar to that observed in humans and non-human primates (NHPs). A simple experiment was carried out with four kinds of stimulus videos (dogs, humans, cats, and cars), and univariate contrasts and RSA searchlight analysis was performed. Previous studies have looked at this question or closely associated questions (e.g. whether there is face selectivity in dogs). The import of the present study is that it looks at multiple types of animate objects, dogs, humans, and cats, and tests whether there was overlapping/similar topography (or magnitude) of responses when these stimuli were compared to the inanimate reference class of cars. The main finding was of some selectivity for animacy though this was primarily driven by the dog stimuli, which did overlap with the other animate stimulus types, but far less so than in humans.

      Strengths:

      I believe that this is an interesting study in so far as it builds on other recent work looking at category-selectivity in the domestic dog. Given the limited number of such studies, I think it is a natural step to consider a number of different animate stimuli and look at their overlap. While some of the results were not wholly surprising (e.g. dog brains respond more selectively for dogs than humans or cats), that does not take away from their novelty, such as it is. The findings of this study are useful as a point of comparison with other recent work on the organization of high-level visual function in the brain of the domestic dog.

      Weaknesses:

      (1) One challenge for all studies like this is a lack of clarity when we say there is organization for "animacy" in the human and NHP brains. The challenge is by no means unique to the present study, but I do think it brings up two more specific topics.

      First, one property associated with animate things is "capable of self-movement". While cognitively we know that cars require a driver, and are otherwise inanimate, can we really assume that dogs think of cars in the same way? After all, just think of some dogs that chase cars. If dogs represent moving cars as another kind of selfmoving thing, then it is not clear we can say from this study that we have a contrast between animate vs inanimate. This would not mean that there are no real differences in neural organization being found.

      It was unclear whether all or some of the car videos showed them moving. But if many/most do, then I think this is a concern.

      We thank this Reviewer for raising this relevant point about the potential animacy of cars for dogs and its implication for our results. Of note, two-thirds of our car stimuli showed a car moving (slow, accelerating, or fast). We acknowledge that these stimuli contained motionbased animacy cues, and in this regard, there was no clear difference between our animate and inanimate conditions, and possibly between some of the representations they elicited. However, our animate and inanimate stimuli differed in other key factors accounting for animacy organization, such as visual features including the presence of faces, bodies, body parts, postures, and certain aspects of biological motion. So we believe that this limitation does not compromise our main conclusions. We will elaborate on this point further in the revised discussion, also considering how dogs’ differential behavioral responses to cars and animate entities may provide additional insights in this regard.

      Second, there is quite a lot of potential complexity in the human case that is worth considering when interpreting the results of this study. In the human case, some evidence suggests that animacy may be more of a continuum (Sha et al. 2015), which may reflect taxonomy (Connolly et al. 2012, 2016). However moving videos seem to be dominated more by signals relevant to threat or predation relative to taxonomy (Nastase et al. 2017). Some evidence suggests that this purported taxonomic organization might be driven by gradation in representing faces and bodies of animals based on their relative similarity to humans (Ritchie et al. 2021). Also, it may be that animacy organization reflects a number of (partially correlated) dimensions (Thorat et al. 2019, Jozwik et al. 2022). One may wonder whether the regions of (partial) overlap in animate responses in the dog brain might have some of these properties as well (or not).

      We agree that it would be interesting to dissect which animacy-related factor(s) contribute to the observed animacy sensitivity in different regions, and although this was not the original aim of the study, we agree that we could have made better use of the variation in our stimuli to discuss this aspect. Specifically, some animacy features are shared by all three animate stimulus classes, namely the presence of biological motions, faces, and bodies. In contrast, animate classes differed in some other aspects, for example in how dogs perceived dogs, humans, and cats as social agents and in their potential behavioral goals towards them. It can therefore be argued that regions with two- and especially three-way overlapping activations are more probably involved in processing biological motion, face and body aspects, and non-overlapping ones the social agency- and behavioural goal-related aspects. In line with this, the shared animacy features are indeed ones that have been reported to be central in human animacy representation and that may have made the overlaps in human brain responses greater. We will provide a more detailed discussion of the results from this viewpoint in the revised manuscript.

      (2) It is stated that previous studies provide evidence that the dog brain shows selectivity to "certain aspects of animacy". One of these already looked at selectivity for dog and human faces and bodies and identified similar regions of activity (Boch et al. 2023). An earlier study by Dilks et al. (2015), not cited in the present work (as far as I can tell), also used dynamic stimuli and did not suffer from the above limitations in choosing inanimate stimuli (e.g. using toy and scene objects for inanimate stimuli). But it only included human faces as the dynamic animate stimulus. So, as far as stimulus design, it seems the import of the present study is that it included a *third* animate stimulus (cats) and that the stimuli were dynamic.

      We agree with this Reviewer that the findings of Dilks et al. (2015) are relevant to our study and have therefore cited them. However, the citation itself was imprecise and will be corrected in the revised manuscript.

      (3) I am concerned that the univariate results, especially those depicted in Figure 3B, include double dipping (Kriegesorte et al. 2009). The analysis uses the response peak for the A > iA contrast to then look at the magnitude of the D, H, C vs iA contrasts. This means the same data is being used for feature selection and then to estimate the responses. So, the estimates are going to be inflated. For example, the high magnitudes for the three animate stimuli above the inanimate stimuli are going to inherently be inflated by this analysis and cannot be taken at face value. I have the same concern with the selectivity preference results in Figure 3E.

      I think the authors have two options here. Either they drop these analyses entirely (so that the total set of analyses really mirrors those in Figure 4), or they modify them to address this concern. I think this could be done in one of two ways. One would be to do a within- subject standard split-half analysis and use one-half of the data for feature selection and the other for magnitude estimation. The other would be to do a between-subject design of some kind, like using one subject for magnitude estimation based on an ROI defined using the data for the other subjects.

      We thank both Reviewers again for raising this important point about potential double dipping. We also thank this Reviewer for specific suggestions for split-half analyses – we agree that, had our original analyses involved double dipping, such a modification would be necessary. But, as we explained in our response above, this was not the case. Indeed, whereas we do visualize all four conditions in Fig. 3B, we only conducted t-tests to assess differences between the three animate conditions (the corresponding stats have been missing from the original manuscript but will be added during revision). So, importantly, we did not evaluate the magnitude of the D, H, C vs iA contrasts in any of the ROIs defined by animate-sensitive peaks; therefore, we believe that these analyses do not involve double dipping. This holds for the species preference results in Fig. 3E as well. We will clarify this in the revised manuscript. Of note, in response to a request by the other reviewer and to provide richer information about the univariate results, we will also provide response profiles and corresponding stats for a broad set of additional ROIs, defined either anatomically or functionally by other studies (e.g., Boch et al., 2023).

      (4) There are two concerns with how the overlap analyses were carried out. First, as typically carried out to look at overlap in humans, the proportion is of overlapping results of the contrasts of interest, e.g, for face and body selectivity overlap (Schwarlose et al. 2006), hand and tool overlap (Bracci et al. 2012), or more recently, tool and food overlap (Ritchie et al. 2024). There are a number of ways of then calculating the overlap, with their own strengths and weaknesses (see Tarr et al. 2007). Of these, I think the Jaccard index is the most intuitive, which is just the intersection of two sets as a proportion of their union. So, for example, the N of overlapping D > iA and H > iA active voxels is divided by the total number of unique active voxels for the two contrasts. Such an overlap analysis is more standard and interpretable relative to previous findings. I would strongly encourage the authors to carry out such an analysis or use a similar metric of overlap, in place of what they have currently performed (to the extent the analysis makes sense to me).

      We agree with this Reviewer that the Jaccard index is an intuitive and straightforward overlap measure. Importantly, for our overlap calculations we already use this measure (and a very similar one) – but we acknowledge that this was not clear from the original description. Specifically, for the multivariate overlap test, we used the Jaccard index exactly as described by this Reviewer. For the univariate overlap test, we use a very similar measure, with the only difference that there, to reference the search space, the intersection of specific animate-inanimate contrasts was divided by the total voxel number of animate-sensitive areas (which is highly similar to the union of the specific animate-inanimate contrasts). In the revised submission we will provide a more detailed explanation of the overlap calculations, making it explicit that we used the Jaccard index (and a variant of it).

      Second, the results summarized in Figure 3A suggest multiple distinct regions of animacy selectivity. Other studies have also identified similar networks of regions (e.g. Boch et al. 2023). These regions may serve different functions, but the overlap analysis does not tell us whether there is overlap in some of these portions of the cortex and not in others. The overlap is only looked at in a very general sense. There may be more overlap locally in some portions of the cortex and not in others.

      We thank this Reviewer for this comment, we agree that adding spatial specificity to these results will improve the manuscript. Therefore, during revision, we will assess the anatomical distribution of the overlap results, making use of a broad set of ROIs potentially relevant for animacy perception, defined either anatomically or functionally by other studies (e.g., Boch et al., 2023 for dogs).

      (5) Two comments about the RSA analyses. First, I am not quite sure why the authors used HMAX rather than layers of a standardly trained ImageNet deep convolutional neural network. This strikes me also as a missed opportunity since many labs have looked at whether later layers of DNNs trained on object categorization show similar dissimilarity structures as category-selective regions in humans and NHPs. In so far as cross-species comparisons are the motivation here, it would be genuinely interesting to see what would happen if one did a correlation searchlight with the dog brain and layers of a DNN, a la Cichy et al. (2016).

      We thank the Reviewer for this comment and suggestion. At the start of the project, HMAX was the most feasible model to implement given our time and expertise constrains. Additionally, the biologically motivated HMAX was also an appropriate choice, as it simulates the selective tuning of neurons in the primary visual cortex (V1) of primates, which is considered homologous with V1 in carnivores (Boch et al., 2024).

      Although we agree that using DNNs have recently been extensively and successfully used to explore object representations and could provide valuable additional insights for dogs’ visual perception as well, we believe that adding a large set of additional analyses would stretch the frames of this manuscript, disproportionately shifting its focus from our original research question. Also, our experiment, designed with a different, more specific aim in mind, did not provide a large enough stimulus variety of animate stimuli for a general comparison of the cortical hierarchy underlying object representations in dog and human brains and thus our data are not an optimal starting point for such extensive explorations. Having said that, we are thankful for this Reviewer for the idea and will consider using a DNN to uncover dog’ visual cortical hierarchy in future studies with a better suited stimulus set. Furthermore, in accordance with eLife’s data-sharing policies, we will make the current dataset publicly available so further hypothesis and models can be tested.

      Second, from the text is hard to tell what the models for the class- and categoryboundary effects were. Are there RDMs that can be depicted here? I am very familiar with RSA searchlight and I found the description of the methods to be rather opaque. The same point about overlap earlier regarding the univariate results also applies to the RSA results. Also, this is again a reason to potentially compare DNN RDMs to both the categorical models and the brains of both species.

      In the revised manuscript we will provide a more detailed explanation of the methods used to determine class- and category-boundary effects. In short, the analysis we performed here followed Kriegeskorte et al. (2008), and the searchlight test looked for regions in which between-class/category differences were greater than within-class/category differences. We will also include RDMs. Additionally, we will provide anatomical details for the overlap results for RSA, just as for the univariate results, using the same independently defined broad set of ROIs, defined either anatomically or functionally by other studies (e.g., Boch et al., 2023 for dogs).

      (6) There has been emphasis of late on the role of face and body selective regions and social cognition (Pitcher and Ungerleider, 2021, Puce, 2024), and also on whether these regions are more specialized for representing whole bodies/persons (Hu et al. 2020, Taubert, et al. 2022). It may be that the supposed animacy organization is more about how we socialize and interact with other organisms than anything about animacy as such (see again the earlier comments about animacy, taxonomy, and threat/predation). The result, of a great deal of selectivity for dogs, some for humans, and little for cats, seems to readily make sense if we assume it is driven by the social value of the three animate objects that are presented. This might be something worth reflecting on in relation to the present findings.

      We thank the Reviewer for this suggestion. The original manuscript already discussed how motion-related animacy cues involved in social cognition may explain that animacysensitive regions reported in our study extend beyond those reported previously and also the role of biological motion in the observed across-species differences. This discussion of the role of visual diagnostic features and features that involved in perceiving social agents will be extended in the revised discussion, also in response to the first comment of this Reviewer, to reflect on how social cognition-related animacy cues may have affected our results in dogs.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Dad et al. explored the roles of cytosolic carboxypeptidase 5(CCP5)in the development of ependymal multicilia in the brain. CCP family are erasers of polyglutamylation of ciliary-axoneme microtubules. The authors generated a new mutant mouse of Agbl5 gene, which encodes CCP5, with deletion of its N-terminus and partial carboxypeptidase (CP) domain (named AGBL5M1/M1).

      Strengths:

      The mutant mice revealed lethal hydrocephalus due to degeneration of ependymal multicilia. Interestingly, this is in contrast with the phenotype of Agbl5 mutants with disruption solely in the CP domain of CCP5 (named AGBL5M2/M2) that did not develop hydrocephalus despite increased glutamylation levels in ependymal cilia as observed for AGBL5M1/M1 mutants. The study has been well-performed and the findings suggest a unique function of the N-domain of CCP5 in ependymal multicilia stability.

      Weaknesses:

      The content of this article is relatively descriptive and lacks molecular insights.

      We thank the Reviewer’s positive comments. To address the molecular insights of the dysregulated planar cell polarity (PCP) in Agbl5<sup>M1/M1</sup> ependyma, we are planning to further assess the microtubule polarization and the expression/localization of PCP core proteins in ependymal cells. We also plan to quantify the intensity of actin networks around BB patches to better understand to which extent it is affected in the ependyma of the mutants and contributes to the impaired stability of BBs (Please see below).

      We will also assess whether Agbl5 commonly functions in multiciliated cells of other organs.

      Reviewer #2 (Public review):

      Summary:

      This study analyzed the consequences of Agbl5 mutation on ependymal cell development and function. The authors first characterize their mutant mouse line reporting a reduced lifespand and severe hydrocephalus. Next, they report a defect in ependymal cell cilia number and motility. They provide evidence for impaired basal body organisation and cilia glutamylation.

      Strengths:

      Description of a mutant mouse which implicates Cytosolic Carboxypeptidase 5 (the product of Agbl5 gene) for proper ependymal cells.

      Weaknesses:

      Description of phenotype is incomplete:

      We thank the Reviewer’s constructive comments. We agree that more quantitative analysis of the phenotypes in Agbl5<sup>M1/M1</sup> will strengthen this study.

      - Figure 3G - the sequence from the movie is not really informative. Providing beating frequencies as quantification of the data would be more informative.

      We agree that quantification of the cilia beating frequencies and directions in these experiments will be more informative.

      - Figure 3 - the quantification of actin network would strengthen the message.

      We agree with the Reviewers. We will quantify the total intensity of actin around BB patch and the total intensity of actin per BB to determine to which extent the actin networks are affected in Agbl5<sup>M1/M1</sup> ependymal cells.

      - Lines 219 -220 - the authors conclude “Taken together, in Agbl5<sup>M1/M1</sup> ependymal cells, the expression of genes promoting multiciliogenesis were not impaired but certain proteins associated with differentiated ependymal cells are not properly expressed”. However, they do not assess gene but protein expression (IF). In addition, their quantification shows differences in the number of FoxJ1 positive cells which indeed is an impaired expression.

      We will clarify this statement.

      - Microtubules are involved in the local organization of ciliary basal bodies (see Werner et al., Vladar et al.,2011; Boutin et al., 2014). It would be interesting for the authors to check whether the subapical network of microtubules is glutamylated or not during ependymal cell differentiation and how this network is affected in their mutants.

      We thank the Reviewer’s suggestion. We agree this is an interesting point to look at. We will assess the glutamylation status of the subapical microtubule networks in differentiating ependymal cells and whether they are affected in the mutants.

      - Showing the data mentioned in the discussion on Cep110 would be a nice addition to the paper.

      These results will be provided.

      - Line 354: "The latter serves as a component of tissue polarity that is required for asymmetric PCP protein localization in each cell (Boutin et al., 2014; Vladar et al., 2012)." The cited reference did not demonstrate that this microtubule network is required for asymmetric PCP localization.

      We thank the Reviewer for critical reading. We will correct the citation.

      Reviewer #3 (Public review):

      Summary:

      The authors developed a new Agbl5 KO allele, extending the deletion to the N-terminus of CCP5 to explore its function in mouse ependymal cells.

      Strengths:

      They show that the KO mice exhibit severe hydrocephalus due to disorganized and mislocated basal bodies. Additionally, they present evidence of both impaired beating coordination and a reduction in ciliary beating.

      Weaknesses:

      The manuscript is well-written but lacks specific interpretations of the results presented. Further experiments are needed to be fully convincing.

      We thank the Reviewer’s comments. We plan to conduct the following experiments to strengthen this study.

      (1) Quantify the intensity of actin staining around BB patches and its intensity relative to the number of BBs to assess to which extent the actin networks in Agbl5<sup>M1/M1</sup> ependymal cells are affected (please refer to the above response to the comments of Reviewer 2#).

      (2) Co-stain tdTomato with cell specific markers to strengthen the spatial expression of tdTomato.

      (3) Seek proper antibodies to determine the correlation between signals of GT335 and Ac-Tub in ependymal multicilia of Agbl5<sup>M1/M1</sup> mice.

      (4) Quantitatively compare the size of ependymal cells in the wild-type and Agbl5<sup>M1/M1</sup> mice to address whether there is a consequence of possible dysfunction of primary cilia in the precursors of ependymal cells in the mutants. If so, we will further analyze how the primary cilia in the precursors of ependymal cells are affected in the mutants.

      (5) Address whether the rotational polarity is affected in the Agbl5<sup>M1/M1</sup> mutant mice.

    1. Author response:

      To address Reviewer 1’s concerns, we will implement the following changes:

      Comment 1: We will clarify that, even without direct comparisons within or across species, whether vertically transmitted microbes act as pioneering colonizers or integrate into an existing community is an important factor influencing their effect on community composition.

      Comment 2: We will provide additional details on the biology of the surrogate frog Oophaga sylvatica, explain how tadpole manipulation might influence adhesion to the caregiver, and acknowledge that the lack of knowledge on the physiological mechanisms underlying tadpole attachment currently limits our discussion to speculation.

      We will further clarify in the “Methods” section that SourceTracker’s ability to accurately estimate source proportions was assessed by evaluating how well it assigned training samples to their correct source environments. We will provide the predictions for the training set and describe how they informed our data preprocessing and analysis approach.

      Comment 3: While we predicted that community distances between tadpoles and adults would be smaller in species with parental transport, we explicitly state that our results did not confirm this expectation. We thus see no contradiction in our discussion but will ensure that this point is more clearly communicated. In response to the reviewer’s suggestion, we will incorporate additional literature on how tadpoles’ skin microbial communities change over time and adapt to their environment. We will also expand on how the life history of L. longirostris—specifically, the frequent presence of adults in tadpole habitats—may facilitate horizontal microbiota transmission, potentially contributing to shorter community distances.

      Comment 4: We will remove the network visualization to prevent any misinterpretation.

      Additionally, following Reviewer 2’s suggestion, we will include data on the absolute abundance of ASVs shared between parent and offspring after one month of development to further support the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1:

      The authors have thoroughly changed the manuscript and addressed most of my concerns. I appreciate adding the activity assays of the C115/120S mutants, however, I suggest that the authors embed and also discuss these data more clearly. It also escaped my attention earlier that the positioning of the disulfide bond is 117-122 in the deposited PDBs instead of 115-120. The authors should carefully check which positioning is correct here.

      We thank reviewer #1 for his or her careful assessment of our revised manuscript. As suggested, we detailed the results section “CrSBPase enzymatic activity” with additional numerical values, and discussed more clearly the comparisons of results for activity assays of mutants C115S and C120S in the section “Oligomeric states of CrSBPase”. Residues numbering was carefully proof-checked throughout the manuscript for correctness and homogeneity. C115 and C120 are numbered according to best databases consensus, ie. GenBank and Uniprot, and may differ from one database to another (including PDB) due to varying numbering rules. We clarified the chosen nomenclature in methods section “Cloning and mutagenesis of CrSBPase expression plasmids”.

      Line 246-250: I think it is evident that the two SBPase structures superpose well given the sequence identity of more than 70%. However, it would be great to include a superposition of the two structures in Figure 1, especially with regard to the region harboring C115 and C120.

      We added a panel showing superimposition of CrSBPase 7b2o and PpSBPase 5iz3 and made a close-up view around the region C115-C120 in supplementary figure 5. Given the density in information of figure 1 we prefer not to add additional images on it. Supplementary figure 5 was initially intended to illustrate sequence conservation/variation among homologs, thus fitting with the objective to compare past and present XRC results.

      Line 255-266: I am again missing a panel in Figure 1 here, e.g. a side-by-side view of Xray vs AF2/3 structure.

      We added another panel in supplementary figure 5 to visually compare side-by-side SBPase crystallographic structure 7b2o and our AF3 model. Again, for the sake of clarity we prefer not to overload figure 1 with additional panels. This will also enable thorough comparison of past XRC of PpSBPase, present XRC of CrSBPase, and various AF models (see below, oligomer comparisons).

      Line 261-266: Did the authors predict dimers and tetramers using AF3? What are the confidence metrics in this case? Do the authors see differences to the monomer prediction in case a multimer is confidently predicted?

      We modeled dimers and tetramers using AF3 and added them on supplementary figure 5 side by side with protomer of XRC model 7b2o and with monomer predicted by AF3. Color code for supplementary figure 5 panels F-H is according to AF standard representation of plDDT. Confidence metrics per residue correspond to very high reliability (navy blue) or, locally, confident prediction (cyan) and overall prediction scores range from pTM=0.85-0.91, a high-quality prediction. Interface prediction score is high for both dimer (ipTM=0.9) and tetramer (ipTM=0.82). We reported these data in supplementary figure 5 and corresponding updated legend. XRC and AF models all align with RMSD<0.5 Å, indicating a globally unchanged structure of the protomer in the various methods and oligomeric states.

      Line 441: How does the oligomeric equilibrium change in C115/120S mutants? This information should be added for the mutants. Besides, the mAU units in Fig. 6 could be normalized to allow an easier comparison between the chromatograms of wt and mutants.

      Change in oligomeric equilibrium is assessed by size-exclusion chromatography of WT and mutants C115S, C120S as reported in figure 6A. We made quantitative estimation of WT, and C115S and C120S mutants equilibrium by comparing maximal peak intensity and added this information in the text. Briefly, the oligomer ratio on a scale of 100 is 9:48:43 for WT, 42:25:33 for mutant C115S, and 29:17:54 for mutant C120S (ratio expressed as tetramer:dimer:monomer). We prefer not to normalize values of absorbance, but rather keep the actual measurement of absorbance at 280 nm on the chromatogram of figure 6, for the sake of consistency with the added text and for a more transparent report of the experiment.

      Line 447: WT activity is 12.15+-2.15 and both mutants have a higher activity. The authors should check if their values (96% and 107%) are correct. Besides, did the authors check if the increase in C120S is statistically significant? My impression is that both mutants have a higher activity than the wildtype, in both correlating with increased fractions of the tetramer. This would also make sense, as the corresponding region is part of the tetramer interface in the crystal packing.

      The reported activity values were checked for correctness. Wild-type SBPase specific activity at 12.5 ±2.15 µmol(NADPH) min<sup>-1</sup> mg(SBPase)<sup>-1</sup> was obtained by pre-incubating the enzyme with 1 µM CrTRXf2 supplemented with 1 mM DTT and 10 mM Mg<sup>2+</sup>, while the results of supplementary figure 14 reporting the comparison of activation of WT and mutants, with a variation of 107 or 96 %, were obtained with a slightly different protocol for pre-incubation of the enzyme with 10 mM DTT and 10 mM Mg<sup>2+</sup>. Please note that whether WT enzyme was assayed in 10 mM DTT 10 mM Mg or in 1 µM TRX 1 mM DTT 10 mM Mg, its specific activity appears equal within experimental error. Both mutants have nearly the same activity than the WT in the assay reported in supplementary figure 14: we fully agree that 107% (and 96%) variation is indeed not significant considering the uncertainty of the measurement (see error bars representing standard deviations of the mean in supplementary figure 14). We added this important information in the text. Even though both mutations stabilize the most active tetramer in untreated recombinant protein, we think that after reducting treatment both WT and mutants all reach the same maximal activity because they all form an equivalent proportion of the active tetramer versus alternative oligomeric states. We furhter interprete this piece of data as a decoupling of reduction and catalysis: in physiological conditions we assume that SBPase would initiate activation upon the reduction of disulfide bridges, including but not limited to C115-C120 that restricts the entry into fully active tetramer, at which point SBPase in reduced form reaches maximal activity until another post-translational signal eventually changes its conformation and oligomerisation.

      We thank again reviewer 1 for his or her assessment and valuable suggestions.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      First, the authors confirm the up-regulation of the main genes involved in the three branches of the Unfolded Protein Response (UPR) system in diet-induced obese mice in AT, observations that have been extensively reported before. Not surprisingly, IRE1a inhibition with STF led to an amelioration of the obesity and insulin resistance of the animals. Moreover, non-alcoholic fatty liver disease was also improved by the treatment. More novel are their results in terms of thermogenesis and energy expenditure, where IRE1a seems to act via activation of brown AT. Finally, mice treated with STF exhibited significantly fewer metabolically active and M1-like macrophages in the AT compared to those under vehicle conditions. Overall, the authors conclude that targeting IRE1a has therapeutical potential for treating obesity and insulin resistance.

      The study has some strengths, such as the detailed characterization of the effect of STF in different fat depots and a thorough analysis of macrophage populations. However, the lack of novelty in the findings somewhat limits the study´s impact on the field.

      We thank the reviewer for the appreciation of our findings. We would use the opportunity to highlight several novelties. First, we characterized the relationship between the newly discovered CD9<sup>+</sup> ATMs and the “M1-like” CD11c+ ATMs. Second, we demonstrated that M2 macrophage population was not reduced but instead increased in adipose tissue in obesity. Third, IRE1 inhibition does not improve thermogenesis by boosting M2 population, but instead, IRE1 inhibition suppresses pro-inflammatory macrophage populations including the M1-like ATMs.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Wu D. et al. explores an innovative approach in immunometabolism and obesity by investigating the potential of targeting macrophage Inositol-requiring enzyme 1α (IRE1α) in cases of overnutrition. Their findings suggest that pharmacological inhibition of IRE1α could influence key aspects such as adipose tissue inflammation, insulin resistance, and thermogenesis. Notable discoveries include the identification of High-Fat Diet (HFD)-induced CD9<sup>+</sup> Trem2+ macrophages and the reversal of metabolically active macrophages' activity with IRE1α inhibition using STF. These insights could significantly impact future obesity treatments.

      Strengths:

      The study's key strengths lie in its identification of specific macrophage subsets and the demonstration that inhibiting IRE1α can reverse the activity of these macrophages. This provides a potential new avenue for developing obesity treatments and contributes valuable knowledge to the field.

      Weaknesses:

      The research lacks an in-depth exploration of the broader metabolic mechanisms involved in controlling diet-induced obesity (DIO). Addressing this gap would strengthen the understanding of how targeting IRE1α might fit into the larger metabolic landscape.

      We thank the reviewer for the appreciation of strengths in our manuscript. In particular, we appreciate the reviewer’s recommendation on the exploration of broader metabolic landscape, such as the effect of IRE1 inhibition on non-adipose tissue macrophages and metabolism. We agree that achieving these will certainly broaden the therapeutic potential of IRE1 inhibition to larger metabolic disorders and we will pursue these explorations in future studies.

      Impact and Utility:

      The findings have the potential to advance the field of obesity treatment by offering a novel target for intervention. However, further research is needed to fully elucidate the metabolic pathways involved and to confirm the long-term efficacy and safety of this approach. The methods and data presented are useful, but additional context and exploration are required for broader application and understanding.

      Comments on revisions:

      The author has revised the manuscript and addressed the most relevant comments raised by the reviewers. The paper is now significantly improved, though two minor issues remain.

      (1) Studies were limited to male mice; this should be mentioned in the paper's Title.

      Thanks for comment. We have modified the title to reflect the male mice only.

      (2) Please include the sample size (n=) in all provided tables in the main manuscript and supplementary tables.

      We have included the sample size in the main manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Bacterial effectors that interfere with the inner molecular workings of eukaryotic host cells are of great biological significance across disciplines. On the one hand they help us to understand the molecular strategies that bacteria use to manipulate host cells. On the other hand they can be used as research tools to reveal molecular details of the intricate workings of the host machinery that is relevant for the interaction/defence/symbiosis with bacteria. The authors investigate the function and biological impact of a rhizobial effector that interacts with and modifies, and curiously is modified by, legume receptors essential for symbiosis. The molecular analysis revealed a bacterial effector that cleaves a plant symbiosis signaling receptor to inhibit signaling and the host counterplay by phosphorylation via a receptor kinase. These findings have potential implications beyond bacterial interactions with plants.

      Bao and colleagues investigated how rhizobial effector proteins can regulate the legume root nodule symbiosis. A rhizobial effector is described to directly modify symbiosis-related signaling proteins, altering the outcome of the symbiosis. Overall, the paper presents findings that will have a wide appeal beyond its primary field.

      Out of 15 identified effectors from Sinorhizobium fredii, they focus on the effector NopT, which exhibits proteolytic activity and may therefore cleave specific target proteins of the host plant. They focus on two Nod factor receptors of the legume Lotus japonicus, NFR1 and NFR5, both of which were previously found to be essential for the perception of rhizobial nod factor, and the induction of symbiotic responses such as bacterial infection thread formation in root hairs and root nodule development (Madsen et al., 2003, Nature; Tirichine et al., 2003; Nature). The authors present evidence for an interaction of NopT with NFR1 and NFR5. The paper aims to characterize the biochemical and functional consequences of these interactions and the phenotype that arises when the effector is mutated.

      Evidence is presented that in vitro NopT can cleave NFR5 at its juxtamembrane region. NFR5 appears also to be cleaved in vivo. and NFR1 appears to inhibit the proteolytic activity of NopT by phosphorylating NopT. When NFR5 and NFR1 are ectopically over-expressed in leaves of the non-legume Nicotiana benthamiana, they induce cell death (Madsen et al., 2011, Plant Journal). Bao et al., found that this cell death response is inhibited by the coexpression of nopT. Mutation of nopT alters the outcome of rhizobial infection in L. japonicus. These conclusions are well supported by the data.

      The authors present evidence supporting the interaction of NopT with NFR1 and NFR5. In particular, there is solid support for cleavage of NFR5 by NopT (Figure 3) and the identification of NopT phosphorylation sites that inhibit its proteolytic activity (Figure 4C). Cleavage of NFR5 upon expression in N. benthamiana (Figure 3A) requires appropriate controls (inactive mutant versions) that have been provided, since Agrobacterium as a closely rhizobia-related bacterium might increase defense related proteolytic activity in the plant host cells.

      We appreciate your recognition of the importance of appropriate controls in our experimental design. In response to your comments, we revised our manuscript to ensure that the figures and legends provide a clear description of the controls used. We also included a more detailed description of our experimental design at several places. In particular, we have highlighted the use of the protease-dead version of NopT as a control (NopT<sup>C93S</sup>). Therefore, NFR5-GFP cleavage in N. benthamiana clearly depended on protease activity of NopT and not on Agrobacterium (Fig. 3A). In the revised text, we carefully revied the conclusion and do not conclude at this stage that NopT proteolyzes NFR5. However, our subsequent experiments, including in vitro experiments, clearly show that NopT is able to proteolyze NFR5.

      Key results from N. benthamiana appear consistent with data from recombinant protein expression in bacteria. For the analysis in the host legume L. japonicus transgenic hairy roots were included. To demonstrate that the cleavage of NFR5 occurs during the interaction in plant cells the authors build largely on western blots. Regardless of whether Nicotiana leaf cells or Lotus root cells are used as the test platform, the Western blots indicate that only a small proportion of NFR5 is cleaved when co-expressed with nopT, and most of the NFR5 persists in its full-length form (Figures 3A-D). It is not quite clear how the authors explain the loss of NFR5 function (loss of cell death, impact on symbiosis), as a vast excess of the tested target remains intact. It is also not clear why a large proportion of NFR5 is unaffected by the proteolytic activity of NopT. This is particularly interesting in Nicotiana in the absence of Nod factor that could trigger NFR1 kinase activity.

      Thank you for your comments regarding the cleavage of NFR5 by NopT and its functional implications. We acknowledge that our immunoblots indicate only a relatively small proportion of the NFR5 cleavage product. Possible explanations could be as follows:

      (1) The presence of full-length NFR5 does not preclude a significant impact of NopT on function of NFR5, as NopT is able to interact with NFR5. In other words, the NopT-NFR5 and NopT-NFR1 interactions at the plasma membrane might influence the function of the NFR1/NFR5 receptor without proteolytic cleavage of NFR5. In fact, protease-dead NopT<sup>C93S</sup> expressed in NGR234ΔnopT showed certain effects in L. japonicus (less infection foci were formed compared to NGR234ΔnopT Fig. 5E). In this context, it is worth mentioning that the non-acylated NopT<sup>C93S</sup> (Fig. 1B) and NopT<sub>USDA257</sub> (Fig. 6B) proteins were unable to suppress NFR1/NFR5-induced cell death in N. benthamina, but this could be explained by the lack of acylation and altered subcellular localization.

      (2) In the cleavage assay, only small portion of NFR5 could be detected for cleavage by NopT. However, this cleavage might be sufficient to suppress signaling pathways, leading to the observed phenotypic changes (loss of cell death in N. benthamiana; altered infection in L. japonicus). We do believe this is a great point, therefore, we carefully revised the conclusion about this point. Throughout the paper, we stated that the cleavage of NFR5 suppresses symbiotic signaling but not disrupt the symbiotic signaling. We also removed the conclusion that cleavage of NFR5 by NopT results in the function loss of NFR5.

      (3) N. benthamiana co-expressing NFR1/NFR5 leads to strong cell death, which suggest that the NFR1 kinase activity might be constitutively active even in the absence of Nod factors. But why co-expression of symbiotic receptor leads to cell death and how kinase activity is active in the absence of Nod factor are not clear, which is of great interest to be studied.

      (4) The proteolytic activity of NopT may be reduced by the interaction of NopT with other proteins such as NFR1, which phosphorylates NopT and inactivates its protease activity.

      In our revised manuscript version, we provide now quantitative data for the efficiency of NFR5 cleavage by NopT in different expression systems used (Figure 3 and Supplemental Fig. 16). We have also improved our Discussion in this context.  

      Comments on latest version:

      The presentation of the figures and the language has greatly improved and the specific mistakes pointed out in the last review have been corrected. I especially appreciate the new images used to illustrate the observed mutant phenotypes, which are much clearer and easier to understand. The pictures used to illustrate the mutant phenotypes seem to be of more comparable root regions than before. Overall, the requested changes have been implemented, with some exceptions described below.

      • Figure 1: New representative images are shown for BAX1 and CERK1. These pictures are more consistent with the phenotype seen in other treatments, but since the data has not changed, I presume the data from leaf discs (where the leaf discs for these treatments looked very different) previously shown is still included. The criteria for what was considered cell death is in my opinion still not described in the legend. The cell death/total ratio has been added for all leaf discs, as requested.

      Thank you so much for carefully pointing out this. Cell death in leaf disc results in the formation of necrotic plaques, which restrains pathogens within deceased cells. These plaques commonly manifest as leaf dehydration, frequently accompanied by a translucent appearance. Brown and shriveled leaf discs serve as indicators of cell death. We have added these descriptions in the figure legend of Figure 1.

      • Figure 2: the discussion of the figure now emphasizes direct protein interaction. There is still no size marker in 2D or a description of size in the figure legend, making it difficult to compare the result to Figure 3. If I understand the rebuttal comments correctly, there are other bands on the blot, including non-specific bands. This does not negate the need to include the full blot as a supplemental figure to show cleaved NFR5 as well as other bands. I do not see any other clarifications on this subject in the manuscript.

      Thank you for your suggestion. In the revised manuscript, we have included the kDa range for all proteins detected in Figure.2D. The full blot of Co-IP assay was shown in Fig S2 (a new supplemental data). Yes, we detected some smaller bands after immunoblot, but we cannot give clear conclusion of what these bands are based on the current study. Interestingly, these smaller bands were immunoprecipitated by anti-FLAG beads, suggesting that these bands are some truncated peptides from NFR5.

      • Figure 5: From the pictures, it is now easier to understand what is meant by "infection foci". Although there is no description in the methods of how these were distinguished from infection threads, I believe the images are clear enough.

      Thank you for your helpful comment. In the revised manuscript, we have added the descriptions about this experiment in the method section and in the legend in Figure 5A.

      • Figure 6: The changes in the discussion are appreciated, but panel E still misrepresents the evidence in the paper, as from the drawing it still seems that the cleaved NFR5 is somehow directly responsible for suppressing infection when this was not shown.

      Thank you for your thoughtful comments. We appreciate your suggestion to the schematic model to illustrate the cleavage of NFR5 to suppressing rhizobia infection. In the revised manuscript, we have changed the model in Figure 6E.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents data demonstrating NopT's interaction with Nod Factor Receptors NFR1 and NFR5 and its impact on cell death inhibition and rhizobial infection. The identification of a truncated NopT variant in certain Sinorhizobium species adds an interesting dimension to the study. These data try to bridge the gaps between classical Nod-factor-dependent nodulation and T3SS NopT effector-dependent nodulation in legume-rhizobium symbiosis. Overall, the research provides interesting insights into the molecular mechanisms underlying symbiotic interactions between rhizobia and legumes.

      Strengths:

      The manuscript nicely demonstrates NopT's proteolytic cleavage of NFR5, regulated by NFR1 phosphorylation, promoting rhizobial infection in L. japonicus. Intriguingly, authors also identify a truncated NopT variant in certain Sinorhizobium species, maintaining NFR5 cleavage but lacking NFR1 interaction. These findings bridge the T3SS effector with the classical Nod-factor-dependent nodulation pathway, offering novel insights into symbiotic interactions.

      Weaknesses:

      (1) In the previous study, when transiently expressed NopT alone in Nicotiana tobacco plants, proteolytically active NopT elicited a rapid hypersensitive reaction. However, this phenotype was not observed when expressing the same NopT in Nicotiana benthamiana (Figure 1A). Conversely, cell death and a hypersensitive reaction were observed in Figure S8. This raises questions about the suitability of the exogenous expression system for studying NopT proteolysis specificity.

      We appreciate your attention to these plant-specific differences. Previous studies showed that NopT expressed in tobacco (N. tabacum) or in specific Arabidopsis ecotypes (with PBS1/RPS5 genes) causes rapid cell death (Dai et al. 2008; Khan et al. 2022). Khan et al. 2022 reported recently that cell death does not occur in N. benthamiana unless the leaves were transformed with PBS1/RPS5 constructs. Our data shown in Fig. S17 confirm these findings. As cell death is usually associated with induction of plant protease activities, we considered N. tabacum and A. thaliana plants as not suitable for testing NFR5 cleavage by NopT. In fact, no NopT/NFR5 experiments were not performed with these plants in our study. In response to your comment, we now better describe the N. benthamiana expression system and cite the previous articles_. Furthermore, we have revised the Discussion section to better emphasize effector-induced immunity in non-host plants and the negative effect of rhizobial effectors during symbiosis. Our revisions certainly provide a clearer understanding of the advantages and limitations of the _N. benthamiana expression system.

      (2) NFR5 Loss-of-function mutants do not produce nodules in the presence of rhizobia in lotus roots, and overexpression of NFR1 and NFR5 produces spontaneous nodules. In this regard, if the direct proteolysis target of NopT is NFR5, one could expect the NGR234's infection will not be very successful because of the Native NopT's specific proteolysis function of NFR5 and NFR1. Conversely, in Figure 5, authors observed the different results.

      Thank you for this comment, which points out that we did not address this aspect precisely enough in the original manuscript version. We improved our manuscript and now write that nfr1 and nfr5 mutants do not produce nodules (Madsen et al., 2003; Radutoiu et al., 2003) and that over-expression of either NFR1 or NFR5 can activate NF signaling, resulting in formation of spontaneous nodules in the absence of rhizobia (Ried et al., 2014). In fact, compared to the nopT knockout mutant NGR234ΔnopT, wildtype NGR234 (with NopT) is less successful in inducing infection foci in root hairs of L. japonicus (Fig. 5). With respect to formation of nodule primordia, we repeated our inoculation experiments with NGR234ΔnopT and wildtype NGR234 and also included a nopT over-expressing NGR234 strain into the analysis. Our data clearly showed that nodule primordium formation was negatively affected by NopT. The new data are shown in Fig. 5 of our revised version. Our data show that NGR234 infection is not really successful, especially when NopT is over-expressed. This is consistent with our observations that NopT targets Nod factor receptors in L. japonicus and inhibits NF signaling (NIN promoter-GUS experiments). Our findings indicate that NopT might be an “Avr effector” for L. japonicus. However, in other host plants of NGR234, NopT possesses a symbiosis-promoting role (Dai et al. 2008; Kambara et al. 2009). Such differences could be explained by different NopT targets in different plants (in addition to Nod factor receptors), which may influence the outcome of the infection process. Indeed, our work shows that NopT can interact with various kinase-dead LysM domain receptors, suggesting a role of NopT in suppression or activation of plant immunity responses depending on the host plant. We discuss such alternative mechanisms in our revised manuscript version and emphasize the need for further investigation to elucidate the precise mechanisms underlying the observed infection phenotype and the role of NopT in modulating symbiotic signaling pathways. In this context, we would also like to mention the new figures of our manuscript which are showing (i) the efficiency of NFR5 cleavage by NopT in different expression systems (Figure 3), (ii) the interaction between NopT<sup>C93S</sup> and His-SUMO-NFR5JM-GFP (Supplementary Fig. 5), and (iii) cleavage of His-SUMO-NFPJM-GFP by NopT (Supplementary Figs. S8 and S9).

      (3) In Figure 6E, the model illustrates how NopT digests NFR5 to regulate rhizobia infection. However, it raises the question of whether it is reasonable for NGR234 to produce an effector that restricts its own colonization in host plants.

      Thank you for mentioning this point. We are aware of the possible paradox that the broad-host-range strain NGR234 produces an effector that appears to restrict its infection of host plants. As mentioned in our answer to the previous comment, NopT could have additional functions beyond the regulation of Nod factor signaling. In our revised manuscript version, we have modified our text as follows:

      (1) We mention the potential evolutionary aspects of NopT-mediated regulation of rhizobial infection and discuss the possibility that interactions between NopT and Nod factor receptors may have evolved to fine-tune Nod factor signaling to avoid rhizobial hyperinfection in certain host legumes.

      (2) We also emphasize that the presence of NopT may confer selective advantages in other host plants than L. japonicus due to interactions with proteins related to plant immunity. Like other effectors, NopT could suppress activation of immune responses (suppression of PTI) or cause effector-triggered immunity (ETI) responses, thereby modulating rhizobial infection and nodule formation. Interactions between NopT and proteins related to the plant immune system may represent an important evolutionary driving force for host-specific nodulation and explain why the presence of NopT in NGR234 has a negative effect on symbiosis with L. japonicus but a positive one with other legumes.

      (4) The failure to generate stable transgenic plants expressing NopT in Lotus japonicus is surprising, considering the manuscript's claim that NopT specifically proteolyzes NFR5, a major player in the response to nodule symbiosis, without being essential for plant development.

      We also thank for this comment. We have revised the Discussion section of our manuscript and discuss now our failure to generate stable transgenic L. japonicus plants expressing NopT. We observed that the protease activity of NopT in aerial parts of L. japonicus had a negative effect on plant development, whereas NopT expression in hairy roots was possible. Such differences may be explained by different NopT substrates in roots and aerial parts of the plant. In this context, we also discuss our finding that NopT not only cleaves NFR5 but is also able to proteolyze other proteins of L. japonicus such as LjLYS11, suggesting that NopT not only suppresses Nod factor signaling, but may also interfere with signal transduction pathways related to plant immunity. We speculate that, depending on the host legume species, NopT could suppress PTI or induce ETI, thereby modulating rhizobial infection and nodule formation.

      Comments on revised version:

      This version has effectively addressed most of my concerns. However, one key issue remains unresolved regarding the mechanism of NopT in regulating nodule symbiosis. Specifically, the explanation of how NopT catabolizes NFR5 to regulate symbiosis is still not convincing within the current framework of plant-microbe interaction, where plants are understood to genetically control rhizobial colonization.

      While alternative regulatory mechanisms in plant-microbe interactions are plausible, the notion that the NRG234-secreted effector NopT could reduce its own infection by either suppressing plant immunity or degrading the symbiosis receptor remains unsubstantiated. I believe further revisions are needed in the discussion section to more clearly address and clarify these findings and any lingering uncertainties.

      We appreciate your positive comments on the reason why NopT catabolizes NFR5 to regulate symbiosis. NopT belongs to pathogen effecftors YopT family and also cleavage Arabidopsis AtLYK5 and L. japonicus LjLYS11 which trigger immunity responses in plants. NFR5, AtLYK5 and LjLYS11 has the conserved amino acid motif at the juxtamembrane domain, leading to cleaving NFR5 by NopT during symbiosis. Besides, in plant-microbe interaction, effector HopB1 cleaves immune co-receptor BAK1 at the kinase domain to inhibit plant defense. The effect on cleavage of receptor may be positive or negative. NopT suppressing symbiosis may avoid preventing hyperinfection in the specific interaction between rhizobia and legumes. In the revised manuscript, we have emphasized this point more clearly in why NopT could reduce its own infection by either suppressing plant immunity in discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Evaluation of the author's responses to the reviewer comments during the first review round

      Reviewer's Comment:

      Regardless of whether Nicotiana leaf cells or Lotus root cells are used as the test platform, the Western blots indicate that only a small proportion of NFR5 is cleaved when co-expressed with NopT, and most of the NFR5 persists in its full-length form (Figures 3A-D). It is not quite clear how the authors explain the loss of NFR5 function (loss of cell death, impact on symbiosis), as a vast excess of the tested target remains intact. It is also not clear why a large proportion of NFR5 is unaffected by the proteolytic activity of NopT. This is particularly interesting in Nicotiana in the absence of Nod factor that could trigger NFR1 kinase activity.

      Summary of response:

      • NopT could be interfering with the NFR1/NFR5 complex without proteolytic cleavage

      • The cleaved fraction may still be sufficient to disrupt signaling pathways

      • Elevated abundance of NFR5 relative to WT levels

      • Add quantitative data for efficiency of NFR5 cleavage in different systems

      Evaluation of response:

      • The quantification of NFR5 cleavage efficiency is welcome, and there is some discussion of the possible reasons for the large proportion of uncleaved NFR5. It is clear that there is a large difference in cleavage efficiency between L. japonicus roots and N. benthamiana.

      • The data is shown as a bar plot. Given that only 3 biological replicates are used, the data points should be shown, and there is too little data to provide sensible error bars. It would be better to simply make a dot-plot and indicate the mean for each sample. However, the main aim of the comment is addressed.

      Thank you for your constructive comments regarding Figure S16. In the revised manuscript, we have presented these data into dot-Plot format.

      Reviewer's Comment:

      It is also difficult to evaluate how the ratios of cleaved and full-length protein change when different versions of NopT are present without a quantification of band strengths normalized to loading controls (Figure 3C, 3D, 3F). The same is true for the blots supporting NFR1 phosphorylation of NopT (Figure 4A).

      Summary of response:

      • Quantified proportion of cleaved and full length NFR5 in different systems (S14)

      • Band strengths of immunoblots quantified (4B)

      Evaluation of response:

      • The quantification has been performed as requested and the data is shown as bar plots. This type of data is frequently displayed as part of the blot figure itself, printed under each respective lane, making it easier for the reader to connect the ratios to the band sizes. If data is shown in a plot, the data points should be shown on the plot, as described above.

      Thank you for your constructive comments regarding Figure 3. In the revised manuscript, we have added the cleavage efficiency in the 3A-3D.

      Reviewer's Comment:

      Nodule primordia and infection threads are still formed when L. japonicus plants are inoculated with ∆nopT mutant bacteria, but it is not clear if these primordia are infected or develop into fully functional nodules (Figure 5). A quantification of the ratio of infected and non-infected nodules and primordia would reveal whether NopT is only active at the transition from infection focus to thread or perhaps also later in the bacterial infection process of the developing root nodule.

      Summary of response:

      • Additional experiments with NGR234 or NGR234ΔnopT mutants find no non-infected nodules (fig. 5)

      Evaluation of response:

      • The requested quantification has been done, although the support for the findings would be stronger if also mature nodules per plant were quantified and plotted. If non-infected nodules were neither present in NGR234 or NGR234ΔnopT, it would still be advisable to include images of cross-sections of the fully-developed nodules.

      We appreciate your positive comments on the cross-sections of the fully-developed nodules. In the revised manuscript, we have added the cross-section images of nodules in the Figure S12.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors used a subset of a very large, previously generated 16S dataset to:<br /> (1) Assess age-associated features; and (2) develop a fecal microbiome clock, based on an extensive longitudinal sampling of wild baboons for which near-exact chronological age is known. They further seek to understand deviation from age-expected patterns and uncover if and why some individuals have an older or younger microbiome than expected, and the health and longevity implications of such variation. Overall, the authors compellingly achieved their goals of discovering age-associated microbiome features and developing a fecal microbiome clock. They also showed clear and exciting evidence for sex and rank-associated variation in the pace of gut microbiome aging and impacts of seasonality on microbiome age in females. These data add to a growing understanding of modifiers of the pace of age in primates, and links among different biological indicators of age, with implications for understanding and contextualizing human variation. However, in the current version, there are gaps in the analyses with respect to the social environment, and in comparisons with other biological indicators of age. Despite this, I anticipate this work will be impactful, generate new areas of inquiry, and fuel additional comparative studies.

      Thank you for the supportive comments and constructive reviews.

      Strengths:

      The major strengths of the paper are the size and sampling depth of the study population, including the ability to characterize the social and physical environments, and the application of recent and exciting methods to characterize the microbiome clock. An additional strength was the ability of the authors to compare and contrast the relative age-predictive power of the fecal microbiome clock to other biological methods of age estimation available for the study population (dental wear, blood cell parameters, methylation data). Furthermore, the writing and support materials are clear, informative and visually appealing.

      Weaknesses:

      It seems clear that more could be done in the area of drawing comparisons among the microbiome clock and other metrics of biological age, given the extensive data available for the study population. It was confusing to see this goal (i.e. "(i) to test whether microbiome age is correlated with other hallmarks of biological age in this population"), listed as a future direction, when the authors began this process here and have the data to do more; it would add to the impact of the paper to see this more extensively developed.

      Comparing the microbiome clock to other metrics of biological age in our population is a high priority (these other metrics of biological age are in Table S5 and include epigenetic age measured in blood, the non-invasive physiology and behavior clock (NPB clock), dentine exposure, body mass index, and blood cell counts (Galbany et al. 2011; Altmann et al. 2010; Jayashankar et al. 2003; Weibel et al. 2024; Anderson et al. 2021)). However, we have opted to test these relationships in a separate manuscript. We made this decision because of the complexity of the analytical task: these metrics were not necessarily collected on the same subjects, and when they were, each metric was often measured at a different age for a given animal. Further, two of the metrics (microbiome clock and NPB clock) are measured longitudinally within subjects but on different time scales (the NPB clock is measured annually while microbiome age is measured in individual samples). The other metrics are cross-sectional. Testing the correlations between them will require exploration of how subject inclusion and time scale affect the relationships between metrics.

      We now explain the complexity of this analysis in the discussion in lines 447-450. In addition, we have added the NPB clock (Weibel et al. 2024) to the text in lines 260-262 and to Table S5.

      An additional weakness of the current set of analyses is that the authors did not explore the impact of current social network connectedness on microbiome parameters, despite the landmark finding from members of this authorship studying the same population that "Social networks predict gut microbiome composition in wild baboons" published here in eLife some years ago. While a mother's social connectedness is included as a parameter of early life adversity, overall the authors focus strongly on social dominance rank, without discussion of that parameter's impact on social network size or directly assessing it.

      Thank you for raising this important point, which was not well explained in our manuscript. We find that the signatures of social group membership and social network proximity are only detectable our population for samples collected close in time. All of the samples analyzed in  Tung et al. 2015 (“Social networks predict gut microbiome composition in wild baboons”) were collected within six weeks of each other. By contrast, the data set analyzed here spans 14 years, with very few samples from close social partners collected close in time. Hence, the effects of social group membership and social proximity are weak or undetectable. We described these findings in Grieneisen et al. 2021 and Bjork et al. 2022, and we now explain this logic on line 530, which states, “We did not model individual social network position because prior analyses of this data set find no evidence that close social partners have more similar gut microbiomes, probably because we lack samples from close social partners sampled close in time (Grieneisen et al. 2021; Björk et al. 2022).”

      We do find small effects of social group membership, which is included as a random effect in our models of how each microbiome feature is associated with host age (line 529) and our models predicting microbiome Dage (line 606; Table S6).

      Reviewer #2 (Public review):

      Summary:

      Dasari et al present an interesting study investigating the use of 'microbiota age' as an alternative to other measures of 'biological age'. The study provides several curious insights into biological aging. Although 'microbiota age' holds potential as a proxy of biological age, it comes with limitations considering the gut microbial community can be influenced by various non-age related factors, and various age-related stressors may not manifest in changes in the gut microbiota. The work would benefit from a more comprehensive discussion, that includes the limitations of the study and what these mean to the interpretation of the results.

      We agree and have text to the discussion that expands on the limitations of this study and what those limitations mean for the interpretation of the results. For instance, lines 395-400 read, “Despite the relative accuracy of the baboon microbiome clock compared to similar clocks in humans, our clock has several limitations. First, the clock’s ability to predict  individual age is lower than for age clocks based on patterns of DNA methylation—both for humans and baboons (Horvath 2013; Marioni et al. 2015; Chen et al. 2016; Binder et al. 2018; Anderson et al. 2021). One reason for this difference may be that gut microbiomes can be influenced by several non-age-related factors, including social group membership, seasonal changes in resource use, and fluctuations in microbial communities in the environment”

      In addition, lines 405-411 now reads, “Third, the relationships between potential socio-environmental drivers of biological aging and the resulting biological age predictions were inconsistent. For instance, some sources of early life adversity were linked to old-for-age gut microbiomes (e.g., males born into large social groups), while others were linked to young-for-age microbiomes (e.g., males who experienced maternal social isolation or early life drought), or were unrelated to gut microbiome age (e.g., males who experienced maternal loss; any source of early life adversity in females).”

      Strengths:

      The dataset this study is based on is impressive, and can reveal various insights into biological ageing and beyond. The analysis implemented is extensive and high-level.

      Weaknesses:

      The key weakness is the use of microbiota age instead of e.g., DNA-methylation-based epigenetic age as a proxy of biological ageing, for reasons stated in the summary. DNA methylation levels can be measured from faecal samples, and as such epigenetic clocks too can be non-invasive. I will provide authors a list of minor edits to improve the read, to provide more details on Methods, and to make sure study limitations are discussed comprehensively.

      Thank you for this point. In response, we have deleted the text from the discussion that stated that non-invasive sampling is an advantage of microbiome clocks. In addition, we now propose a non-invasive epigenetic clock from fecal samples as an important future direction for our population (see line 450).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Abstract - The opening 2 sentences are not especially original or reflective of the potential value/ premise of the study. Members of this team have themselves measured variation in biological age in many different ways, and the implication that measuring a microbiome clock is easy or straightforward is not compelling. This paper is very interesting and provides unique insight, but I think overall there is a missed opportunity in the abstract to emphasize this, given the innovative science presented here. Furthermore, the last 2 sentences of the abstract are especially interesting - but missing a final statement on the broader significance of research outside of baboons.

      We appreciate these comments and have revised the Abstract accordingly. The introductory sentences now read, “Mammalian gut microbiomes are highly dynamic communities that shape and are shaped by host aging, including age-related changes to host immunity, metabolism, and behavior. As such, gut microbial composition may provide valuable information on host biological age.” (lines 31-34). The last two sentences of the abstract now read, “Hence, in our host population, gut microbiome age largely reflects current, as opposed to past, social and environmental conditions, and does not predict the pace of host development or host mortality risk. We add to a growing understanding of how age is reflected in different host phenotypes and what forces modify biological age in primates.” (lines 40-43).

      If possible, it would be highly useful to present some comments on concordance in patterns at different levels. Are all ASVs assessed at both the family and genus levels? Do they follow similar patterns when assessed at different levels? What can we learn about the system by looking at different levels of taxonomic assignment?

      The section on relationships between host age and individual microbiome features is already lengthy, so we have not added an analysis of concordance between different taxonomic levels. However, we added a justification for why we tested for age signatures in different levels of taxa to line 171, which reads, “We tested these different taxonomic levels in order to learn whether the degree to which coarse and fine-grained designations categories were associated with host age.”

      To calculate the delta age - please clarify if this was done at the level of years, as suggested in Figure 3C, or at the level of months or portion months, etc?

      Delta age is measured in years. This is now clarified in lines 294, 295, and 578.

      Spelling mistake in table S12, cell B4 (Octovber)

      Thank you. This typo has been corrected.

      Given the start intro with vertebrates, the second paragraph needs some tweaking to be appropriate. Perhaps, "At least among mammals, one valuable marker of biological aging may lie in the composition and dynamics of the mammalian gut microbiome (7-10)." Or simply remove "mammalian".

      We have updated this sentence based on your suggestions in line 54. It reads, “In mammals, one valuable marker of biological aging may lie in the composition and dynamics of the gut microbiome (Claesson et al. 2012; Heintz and Mair 2014; O’Toole and Jeffery 2015; Sadoughi et al. 2022).”

      A rewrite at the end of the introduction is needed to avoid the almost direct repetition in lines 115-118 and 129-131 (including lit cited). One potentially effective way to approach this is to keep the predictions in the earlier paragraph and then more clearly center the approach and the overarching results statement in the latter paragraph. (I.e., "we find that season and social rank have stronger effects on microbiome age than early life events. Further, microbiome age does not predict host development or mortality.").

      Thank you for pointing this out. We have re-organized the predictions in the introduction based on your suggestion. The alternative “recency effects” model now appears in the paragraph that starts in line 110. The final paragraph then centers on the overall approach and the results statement (lines 128-140)

      Be clear in each case where taxon-level trends are discussed if it's at Family, Genus, or other level. It's there most, but not all, of the time.

      We have gone through the text and clarified what taxa or microbiome feature was the subject of our analyses in any places where this was not clear.

      In the legend for Figure 2, add clarification for how values to right versus left of the centered value should be interpreted with respect to age (e.g. "values to x of the center are more abundant in older individuals").

      We now clarify in Figure 2C and 2D that “Positive values are more abundant in older hosts”.

      Figure 3 - Are Panels A, B, and C all needed - can the value for all individuals not also be overlaid in the panel showing sex differences and the same point showing individuals with "old" and "young" microbiomes be added in the same plot if it was slightly larger?

      We agree and have simplified Figure 3. We reduced the number of panels from three to two, and we added the information about how to calculate delta age to Panel A. We also moved the equation from the top of Panel C to the bottom right of Panel A.

      Reviewer #2 (Recommendations for the authors):

      Dasari et al present an interesting study investigating the use of 'microbiota age' as an alternative to other measures of 'biological age'. The study provides several curious insights which in principle warrant publication. However, I do think the manuscript should be carefully revised. Below I list some minor revisions that should be implemented. Importantly, the authors should discuss in the Discussion the pros and cons of using 'microbiota age' as a proxy of 'biological age'. Further, the authors should provide more information on Methods, to make sure the study can be replicated.

      Thank you for these important points. Based on your comments and those of the first reviewer, we have expanded our discussion of the limitations of using microbiota age as a proxy for biological age (see edits to the paragraph starting in line 395).

      We have also expanded our methods around sample collection, DNA extraction, and sequencing to describe our sampling methods, strategies to mitigate and address possible contamination, and batch effects. See lines 483-490 and our citations to the original papers where these methods are described in detail.

      (1) Lines 85-99: I think this paragraph could be revisited to make the assumptions clearer. For instance, the last sentence is currently a little confusing: are authors expecting males to exhibit old-for-age microbiomes already during the juvenile period?

      This prediction has been clarified. Line 96 now reads, “Hence, we predicted that adult male baboons would exhibit gut microbiomes that are old-for-age, compared to adult females (by contrast, we expected no sex effects on microbiome age in juvenile baboons).”

      (2) Lines 118-121: Could the authors discuss this assumption in relation to what has been observed e.g., in humans in terms of delays in gut microbiome development? Delayed/accelerated gut microbiome development has been studied before, so this assumption would be stronger if related to what we know from previous studies.

      This comment refers to the sentence which originally stated, “However, we also expected that some sources of early life adversity might be linked to young-for-age gut microbiota. For instance, maternal social isolation might delay gut microbiome development due to less frequent microbial exposures from conspecifics.” We have slightly expanded the text here (line 117) to explain our logic. We now include citations for our predictions. We did not include a detailed discussion of prior literature on microbiome development in the interest of keeping the same level of detail across all sections on our predictions.

      (3) As the authors discuss, various adversities can lead to old-for-age but also young-for-age microbiome composition. This should be discussed in the limitations.

      We agree. This is now discussed in the sentence starting at line 371, which reads, “…deviations from microbiome age predictions are explained by socio-environmental conditions experienced by individual hosts, especially recent conditions, although the effect sizes are small and are not always directionally consistent.” In addition, the text starting at line 405 now reads, “Third, the relationships between potential socio-environmental drivers of biological aging and the resulting biological age predictions were inconsistent. For instance, some sources of early life adversity were linked to old-for-age gut microbiomes (e.g., males born into large social groups), while others were linked to young-for-age microbiomes (e.g., males who experienced maternal social isolation or early life drought), or were unrelated to gut microbiome age (e.g., males who experienced maternal loss; any source of early life adversity in females).”

      (4) In various places, e.g., lines 129-131, it is a little unclear at what chronological age authors are expecting microbiota to appear young/old-for-age.

      This sentence was removed while responding to the comments from the first reviewer.

      (5) Lines 132-133: this statement could be backed by stating that this is because the gut microbiota can change rapidly e.g., when diet changes (or whatever the authors think could be behind this).

      We have added an expository sentence at line 123, including new citations. This sentence reads, “Indeed, gut microbiomes are highly dynamic and can change rapidly in response to host diet or other aspects of host physiology, behavior, or environments”.

      We now cite:

      · Hicks, A.L., et al. (2018). Gut microbiomes of wild great apes fluctuate seasonally in response to diet. Nature Communications 9, 1786.

      · Kolodny, O., et al. (2019). Coordinated change at the colony level in fruit bat fur microbiomes through time. Nature Ecology & Evolution 3, 116-124.

      · Risely, A., et al. (2021) Diurnal oscillations in gut bacterial load and composition eclipse seasonal and lifetime dynamics in wild meerkats. Nat Commun 12, 6017.

      (6) Lines 135-137: current or past season and social rank? This paragraph introduces the idea that it could be past rather than current socio-environmental factors that might predict microbiota age, so the authors should clarify this sentence.

      We have clarified the information in this sentence. line 135 now reads, “In general, our results support the idea that a baboon’s current socio-environmental conditions, especially their current social rank and the season of sampling, have stronger effects on microbiome age than early life events—many of which occurred many years prior to sampling.”

      (7) Lines 136-137: this sentence could include some kind of a conclusion of this finding. What might this mean?

      We have added a sentence at line 138, which speculates that, “…the dynamism of the gut microbiome may often overwhelm and erase early life effects on gut microbiome age.”

      (8) Use 'microbiota' or 'microbiome' across the manuscript; currently, the terms are used interchangeably. I don't have a strong opinion on this, although typically 'microbiota' is used when data comes from 16S rRNA.

      We have updated the text to replace any instance of “microbiota” with “microbiome”. We use the term microbiome in the sense of this definition from the National Human Genome Research Institute, which defines a microbiome as “the community of microorganisms (such as fungi, bacteria and viruses) that exists in a particular environment”.

      (9) Figure 1 legend: make sure to unify formatting; e.g., present sample sizes as N= or n=, rather than both, and either include or do not include commas in 4-digit values (sample sizes).

      We have checked the formatting related to sample sizes and the use of commas in 4-digits in the main text and supplement. The formats are now consistent.

      (10) Line 166: relative abundances surely?

      Following Gloor et al. (2017), our analyses use centered log-ratio (CLR) transformations of read counts, which is the recommended approach for compositional data such as 16S rRNA amplicon read counts. CLR transformations are scale-invariant, so the same ratio is obtained in a sample with few read versus many reads. We now cite Gloor et al. (2017) at line 169 and in the methods in line 517, which reads “centered log ratio (CLR) transformed abundances (i.e., read counts) of each microbial phyla (n=30), family (n=290), genus (n=747), and amplicon sequence variance (ASV) detected in >25% of samples (n=358). CLR transformations are a recommended approach for addressing the compositional nature of 16S rRNA amplicon read count data (Gloor et al. 2017).”  

      (11) Lines 167-172: were technical factors, e.g., read depth or sequencing batch, included as random effects?

      Thank you for catching this oversight in the text. We did model sequencing depth and batch effects. The sentence starting at line 173 now reads, “For each of these 1,440 features, we tested its association with host age by running linear mixed effects models that included linear and quadratic effects of host age and four other fixed effects: sequencing depth, the season of sample collection (wet or dry), the average maximum temperature for the month prior to sample collection, and the total rainfall in the month prior to sample collection (Grieneisen et al. 2021; Björk et al. 2022; Tung et al. 2015). Baboon identity, social group membership, hydrological year of sampling, and sequencing plate (as a batch effect) were modeled as random effects.”

      (12) Lines 175-180: When discussing how these alpha diversity results relate to previous findings, the authors should be clear about whether they talk about weighted or non-weighted measures of alpha diversity. - also maybe this should be included in the discussion rather than the results? Please consider this when revisiting the manuscript (see how it reads after edits).

      Richness is the only unweighted metric, which we now clarify in line 181. We opted to retain the interpretation in the text in its original location to maintain the emphasis in the discussion on the microbiome clock results.

      (13) Table S1 is very hard to interpret in the provided PDF format as columns are not presented side-by-side. It is currently hard to check model output for e.g., specific families. This needs to be revisited.

      We agree. We believe that eLife’s submission portal automatically generates a PDF for any supplementary item. However, we also include the supplementary tables as an Excel workbook which has the columns presented side-by-side.

      (14) Line 184: taxa meaning what? Unclear what authors refer to with this sentence, taxa across taxonomic levels, or ASVs, or what does the 51.6% refer to?

      We have edited line 191 to clarify that this sentence refers to taxa at all taxonomic levels (phyla to ASVs).

      (15) Line 191: a punctuation mark missing after ref (81).

      We have added the missing period at the end of this sentence.

      (16) Lines 189-197: this should go into the discussion in my opinion.

      We have opted to retain this interpretation, now at line 183.

      (17) Lines 215-219: Not sure what this means; do the authors mean features were not restricted to age-associated taxa, ie also e.g., diversity and other taxa-independent patterns were included? If so, the rest of the highlighted lines should be revisited to make this clear, currently to me it is very unclear what 'These could include features that are not strongly age-correlated in isolation' means. Currently, that sounds like some features included were only age-associated in combination with other features, but unclear how this relates to taxa-dependency/taxa-independency.

      We agree this was not clear. We have revised line 224 to read, “We included all 9,575 microbiome features in our age predictions, as opposed to just those that were statistically significantly associated with age because removing these non-significant features could exclude features that contribute to age prediction via interactions with other taxa.”

      (18) Line 403-407: There is now a paper showing epigenetic clocks can be built with faecal samples, so this argument is not valid. Please revisit in light of this publication: https://onlinelibrary.wiley.com/doi/epdf/10.1111/mec.17330

      Thank you for bringing this paper to our attention. We deleted the text that describes epigenetic clocks as invasive, and we now cite this paper in line 450, which reads, “We also hope to measure epigenetic age in fecal samples, leveraging methods developed in Hanski et al. 2024.”

      (19) Line 427: a punctuation mark/semicolon missing before However.

      We have corrected this typo.

      (20) Lines 419-428: I don't quite understand this speculation. Why would the priority of access to food lead to an old-looking gut microbiome? This paragraph needs stronger arguments, currently unclear and also not super convincing.

      We agree this was confusing. We have revised this text to clarify the explanation. The text starting at line 424 now reads, “This outcome points towards a shared driver of high social status in shaping gut microbiome age in both males and females. While it is difficult to identify a plausible shared driver, one benefit shared by both high-ranking males and females is priority of access to food. This access may result in fewer foraging disruptions and a higher quality, more stable diet. At the same time, prior research in Amboseli suggests that as animals age, their diets become more canalized and less variable (Grieneisen et al. 2021). Hence aging and priority of access to food might both be associated with dietary stability and old-for-age microbiomes. However, this explanation is speculative and more work is needed to understand the relationship between rank and microbiome age.”

      (21) Line 434: remove 'be'.

      We have corrected this typo.

      (22) Line 478: add information on how samples were collected; e.g., were samples collected from the ground? How was cross-contamination with soil microbiota minimised? Were samples taken from the inner part of depositions? These factors can influence microbiota samples quite drastically so detailed info is needed. Also what does homogenisation mean in this context? How soon were samples freeze-dried after sample collection?

      We have expanded our methods with respect to sample collection. This text starts in line 483 and reads, “Samples were collected from the ground within 15 minutes of defecation. For each sample, approximately 20 g of feces was collected into a paper cup, homogenized by stirring with a wooden tongue depressor, and a 5 g aliquot of the homogenized sample was transferred to a tube containing 95% ethanol. While a small amount of soil was typically present on the outside of the fecal sample, mammalian feces contains 1000 times the number of microbial cells in a typical soil sample (Sender, Fuchs, and Milo 2016; Raynaud and Nunan 2014), which overwhelms the signal of soil bacteria in our analyses (Grieneisen et al. 2021). Samples were transported from the field in Amboseli to a lab in Nairobi, freeze-dried, and then sifted to remove plant matter prior to long term storage at -80°C.”

      (23) Line 480 onwards: were negative controls included in extraction batches? Were samples randomised into extraction batches?

      Yes, we included extraction blanks. These are now described in lines 495-500. This text reads, “We included one extraction blank per batch, which had significantly lower DNA concentrations than sample wells (t-test; t=-50, p < 2.2x10-16; Grieneisen et al. 2021). We also included technical replicates, which were the same fecal sample sequenced across multiple extraction and library preparation batches. Technical replicates from different batches clustered with each other rather than with their batch, indicating that true biological differences between samples are larger than batch effects.”

      (24) Were extraction, library prep, and sequencing negative controls included? Is data available?

      We included extraction blanks (described above) and technical replicates, which were the same sample sequenced across multiple extraction and library preparation batches. Technical replicates from different batches clustered with each other rather than with their batch, indicating that true biological differences between samples are larger than batch effects.

      We have updated the data availability statement to read, “All data for these analyses are available on Dryad at https://doi.org/10.5061/dryad.b2rbnzspv. The 16S rRNA gene sequencing data are deposited on EBI-ENA (project ERP119849) and Qiita (study 12949). Code is available at the following GitHub repository: https://github.com/maunadasari/Dasari_etal-GutMicrobiomeAge”.

      (25) Line 562: how were corrected microbiome delta ages calculated? Currently, the authors state x, y and z factors were corrected for, but it is unclear how this was done.

      The paragraph starting at line 577 describes how microbiome delta age was calculated. We have made only a few changes to this text because we were not sure which aspects of these methods confused the reviewer. However, briefly, we calculated sample-specific microbiome Dage in years as the difference between a sample’s microbial age estimate, age<sub>m</sub> from the microbiome clock, and the host’s chronological age in years at the time of sample collection, age<sub>c</sub>. Higher microbiome Dages indicate old-for-age microbiomes, as age<sub>m</sub> > age<sub>c</sub>, and lower values (which are often negative) indicate a young-for-age microbiome, where age<sub>c</sub> > age<sub>m</sub> (see Figure 3).

      (26) Line 579: typo 'as'.

      We have corrected this typo.

      Works Cited

      Altmann, Jeanne, Laurence Gesquiere, Jordi Galbany, Patrick O Onyango, and Susan C Alberts. 2010. “Life History Context of Reproductive Aging in a Wild Primate Model.” Annals of the New York Academy of Sciences 1204:127–38. https://doi.org/10.1111/j.1749-6632.2010.05531.x.

      Anderson, Jordan A, Rachel A Johnston, Amanda J Lea, Fernando A Campos, Tawni N Voyles, Mercy Y Akinyi, Susan C Alberts, Elizabeth A Archie, and Jenny Tung. 2021. “High Social Status Males Experience Accelerated Epigenetic Aging in Wild Baboons.” Edited by George H Perry. eLife 10 (April):e66128. https://doi.org/10.7554/eLife.66128.

      Binder, Alexandra M., Camila Corvalan, Verónica Mericq, Ana Pereira, José Luis Santos, Steve Horvath, John Shepherd, and Karin B. Michels. 2018. “Faster Ticking Rate of the Epigenetic Clock Is Associated with Faster Pubertal Development in Girls.” Epigenetics 13 (1): 85–94. https://doi.org/10.1080/15592294.2017.1414127.

      Björk, Johannes R., Mauna R. Dasari, Kim Roche, Laura Grieneisen, Trevor J. Gould, Jean-Christophe Grenier, Vania Yotova, et al. 2022. “Synchrony and Idiosyncrasy in the Gut Microbiome of Wild Baboons.” Nature Ecology & Evolution, June, 1–10. https://doi.org/10.1038/s41559-022-01773-4.

      Chen, Brian H., Riccardo E. Marioni, Elena Colicino, Marjolein J. Peters, Cavin K. Ward-Caviness, Pei-Chien Tsai, Nicholas S. Roetker, et al. 2016. “DNA Methylation-Based Measures of Biological Age: Meta-Analysis Predicting Time to Death.” Aging (Albany NY) 8 (9): 1844–59. https://doi.org/10.18632/aging.101020.

      Claesson, Marcus J., Ian B. Jeffery, Susana Conde, Susan E. Power, Eibhlís M. O’Connor, Siobhán Cusack, Hugh M. B. Harris, et al. 2012. “Gut Microbiota Composition Correlates with Diet and Health in the Elderly.” Nature 488 (7410): 178–84. https://doi.org/10.1038/nature11319.

      Galbany, Jordi, Jeanne Altmann, Alejandro Pérez-Pérez, and Susan C. Alberts. 2011. “Age and Individual Foraging Behavior Predict Tooth Wear in Amboseli Baboons.” American Journal of Physical Anthropology 144 (1): 51–59. https://doi.org/10.1002/ajpa.21368.

      Gloor, Gregory B., Jean M. Macklaim, Vera Pawlowsky-Glahn, and Juan J. Egozcue. 2017. “Microbiome Datasets Are Compositional: And This Is Not Optional.” Frontiers in Microbiology 8. https://doi.org/10.3389/fmicb.2017.02224.

      Grieneisen, Laura E., Mauna Dasari, Trevor J. Gould, Johannes R. Björk, Jean-Christophe Grenier, Vania Yotova, David Jansen, et al. 2021. “Gut Microbiome Heritability Is Nearly Universal but Environmentally Contingent.” Science 373 (6551): 181–86. https://doi.org/10.1126/science.aba5483.

      Hanski, Eveliina, Susan Joseph, Aura Raulo, Klara M. Wanelik, Áine O’Toole, Sarah C. L. Knowles, and Tom J. Little. 2024. “Epigenetic Age Estimation of Wild Mice Using Faecal Samples.” Molecular Ecology 33 (8): e17330. https://doi.org/10.1111/mec.17330.

      Heintz, Caroline, and William Mair. 2014. “You Are What You Host: Microbiome Modulation of the Aging Process.” Cell 156 (3): 408–11. http://dx.doi.org/10.1016/j.cell.2014.01.025.

      Horvath, Steve. 2013. “DNA Methylation Age of Human Tissues and Cell Types.” Genome Biology 14 (10): R115. https://doi.org/10.1186/gb-2013-14-10-r115.

      Jayashankar, Lakshmi, Kathleen M. Brasky, John A. Ward, and Roberta Attanasio. 2003. “Lymphocyte Modulation in a Baboon Model of Immunosenescence.” Clinical and Vaccine Immunology 10 (5): 870–75. https://doi.org/10.1128/CDLI.10.5.870-875.2003.

      Marioni, Riccardo E., Sonia Shah, Allan F. McRae, Brian H. Chen, Elena Colicino, Sarah E. Harris, Jude Gibson, et al. 2015. “DNA Methylation Age of Blood Predicts All-Cause Mortality in Later Life.” Genome Biology 16 (1): 25. https://doi.org/10.1186/s13059-015-0584-6.

      O’Toole, Paul W., and Ian B. Jeffery. 2015. “Gut Microbiota and Aging.” Science 350 (6265): 1214–15. https://doi.org/10.1126/science.aac8469.

      Raynaud, Xavier, and Naoise Nunan. 2014. “Spatial Ecology of Bacteria at the Microscale in Soil.” PLOS ONE 9 (1): e87217. https://doi.org/10.1371/journal.pone.0087217.

      Sadoughi, Baptiste, Dominik Schneider, Rolf Daniel, Oliver Schülke, and Julia Ostner. 2022. “Aging Gut Microbiota of Wild Macaques Are Equally Diverse, Less Stable, but Progressively Personalized.” Microbiome 10 (1): 95. https://doi.org/10.1186/s40168-022-01283-2.

      Sender, Ron, Shai Fuchs, and Ron Milo. 2016. “Revised Estimates for the Number of Human and Bacteria Cells in the Body.” PLoS Biology 14 (8): e1002533. https://doi.org/10.1371/journal.pbio.1002533.

      Tung, J, L B Barreiro, M B Burns, J C Grenier, J Lynch, L E Grieneisen, J Altmann, S C Alberts, R Blekhman, and E A Archie. 2015. “Social Networks Predict Gut Microbiome Composition in Wild Baboons.” Elife 4. https://doi.org/10.7554/eLife.05224.

      Weibel, Chelsea J., Mauna R. Dasari, David A. Jansen, Laurence R. Gesquiere, Raphael S. Mututua, J. Kinyua Warutere, Long’ida I. Siodi, Susan C. Alberts, Jenny Tung, and Elizabeth A. Archie. 2024. “Using Non-Invasive Behavioral and Physiological Data to Measure Biological Age in Wild Baboons.” GeroScience 46 (5): 4059–74. https://doi.org/10.1007/s11357-024-01157-5.

    1. Author response:

      We thank the reviewers for their thoughtful reading and review of our manuscript. These reviews make clear that, for this work to be complete, we must make progress on the following fronts:

      (1) Expand the discussion to better incorporate alternate explanations of our data

      (2) Improve data visualization and experimental support or an experimental refutation for the following concepts

      a. Photoreceptor-derived lactate exported specifically from photoreceptors is utilized in the RPE TCA cycle

      b. Photoreceptors can utilize lactate as a fuel source when starved of glucose

      To address these concerns, we will focus our efforts on infusing <sup>13</sup>C<sub>6</sub>-glucose into rodΔglut1 mice. Lactate is not made without glucose, so this experiment should indicate whether glucose utilization in photoreceptors provides lactate to the RPE, and whether that lactate is used in the TCA cycle.

      The reviewers also noted that changes in <sup>13</sup>C labeling of RPE TCA cycle intermediates downstream of lactate is not obvious (between C57BL6J mice and AIPL1<sup>-/-</sup>). We think that at least in part, this is a consequence of the way we presented the data. We will improve how we display our data so that the differences of incorporation of <sup>13</sup>C in TCA cycle intermediates in control and AIPL1<sup>-/-</sup> RPE is clearer.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      This paper examines the role of MLCK (myosin light chain kinase) and MLCP (myosin light chain phosphatase) in axon regeneration. Using loss-of-function approaches based on small molecule inhibitors and siRNA knockdown, the authors explore axon regeneration in cell culture and in animal models from central and peripheral nervous systems. Their evidence shows that MLCK activity facilitates axon extension/regeneration, while MLCP prevents it.

      Major concerns:

      (1) In the title, authors indicate that the observed effects from loss-of-function of MLCK/MLCP take place via F-actin redistribution in the growth cone. However, there are no experiments showing a causal effect between changes in axon growth mediated by MLCK/MLCP and F-actin redistribution.

      Thank you for your comments. We revised the title of our manuscript to “MLCK/MLCP regulates mammalian axon regeneration and redistributes the growth cone F-actin”. (line 3)

      (2) The author combines MLCK inhibitors with Bleb (Figure 6), trying to verify if both pairs of inhibitors act on the same target/pathway. MLCK may regulate axon growth independent of NMII activity. However, this has very important implications for the understanding not only on how NMII works and affects axon extension, but also in trying to understand what MLCP is doing. One wonders if MLCP actions, which are opposite of MLCK, also independent of NMII activity? The authors, in the discussion section, try to find an explanation for this finding, but I consider it fails since the whole rationale of the manuscript is still around how MLCK and MLCP affect NMII phosphorylation.

      Thank you for your comments. Although both MLCK and MLCP regulate the activity of NMII, it has been reported that they also govern domain-specific spatial control of actin-based motility in the growth cone. Specifically, MLCK activity is essential for arc translocation and retrograde flow within the P domain, while MLCP appears to specifically modulate arc movement and associated myosin II contractility in the T zone and C domain (Ref). Therefore, it is proposed that the regulatory mechanisms of MLCK and MLCP are highly complex during the process of axon growth. 

      [Ref]:Xiao-Feng Zhang, Andrew W Schaefer, Dylan T Burnette, Vincent T Schoonderwoert, Paul Forscher. Rho-dependent contractile responses in the neuronal growth cone are independent of classical peripheral retrograde actin flow. Neuron. 2003 Dec 4;40(5):931-44.

      What follows is a discussion of the merits and limitations of different claims of the manuscript in light of the evidence presented.

      (1) Using western blot and immunohistochemical analyses, authors first show that MLCK expression is increased in DRG sensory neurons following peripheral axotomy, concomitant to an increase in MLC phosphorylation, suggesting a causal effect (Figure 1). The authors claim that it is common that axon growth-promoting genes are upregulated. It would have been interesting at this point to study in this scenario the regulation of MLCP.

      We thank Reviewer for the positive comment on our manuscript.

      (2) Using DRG cultures and sciatic nerve crush in the context of MLCK inhibition (ML-7) and down-regulation, authors conclude that MLCK activity is required for mammalian peripheral axon regeneration both in vitro and in vivo (Figure 2). In parallel, the authors show that these treatments affect as expected the phosphorylation levels of MLC.

      The in vitro evidence is of standard methods and convincing. However, here, as well as in all other experiments using siRNAs, no Control siRNAs were used. Authors do show that the target protein is downregulated, and they can follow transfected cells with GFP. Still, it should be noted that the standard control for these experiments has not been done.

      Thank you for your comments. We utilized scrambled siRNA as a control. I sincerely apologize for the oversight in the manuscript; although we mentioned that scrambled siRNA was used as a control in the figure legends, we failed to clearly articulate this important information in the methods section. We have revised the manuscript accordingly. (line 87, line 549, line, line 562, line 568).

      (3) The authors then examined the role of the phosphatase MLCP in axon growth during regeneration. The authors first use a known MLCP blocker, phorbol 12,13-dibutyrate (PDBu), to show that is able to increase the levels of p-MLC, with a concomitant increase in the extent of axon regrowth of DRG neurons, both in permissive as well as non-permissive substrates. The authors repeat the experiments using the knockdown of MYPT1, a key component of the MLC-phosphatase, and again can observe a growth-promoting effect (Figure 3).

      The authors further show evidence for the growth-enhancing effect in vivo, in nerve crush experiments. The evidence in vivo deserves more evidence and experimental details (see comment 2). A key weakness of the data was mentioned previously: no control siARN was used.

      Thank you for your comments. As mentioned above, we used scramble siRNA as control in vivo experiment as well.

      (4) In the next set of experiments (presented in Figure 4) authors extend the previous observations in primary cultures from the CNS. For that, they use cortical and hippocampal cultures, and pharmacological and genetic loss-of-function using the above-mentioned strategies. The expected results were obtained in both CNS neurons: inhibition or knockdown of the kinase decreases axon growth, whereas inhibition or knockdown of the phosphatase increases growth. A main weakness in this set is that drugs were used from the beginning of the experiment, and hence, they would also affect axon specification. As pointed in Materials and Method (lines 143-145) authors counted as "axons" neurites longer than twice the diameter of the cell soma, and hence would not affect the variable measured. In any case, to be sure one is only affecting axon extension in these cells, the drugs should have been used after axon specification and maturation, which occurs at least after 5 DIV.

      Thank you for your comments. We acknowledge that the early administration of drugs can lead to unintended effects on neuronal polarization and axon formation. However, in line with our previous publication, we focused exclusively on measuring the longest length of the axon. To quantify axon length, we selected neurons exhibiting an axonal process exceeding twice the diameter of their cell body and measured the longest axon from 100 neurons for each condition (Ref 1, Ref 2). Consequently, we believe that drug administration at the onset of cell culture influences axon formation; however, it does not significantly affect the drug's impact on axon length.

      [Ref 1]: Chang-Mei Liu, Rui-Ying Wang, Saijilafu, Zhong-Xian Jiao, Bo-Yin Zhang, Feng-Quan Zhou. MicroRNA-138 and SIRT1 form a mutual negative feedback loop to regulate mammalian axon regeneration. Genes Dev. 2013 Jul 1;27(13):1473-83.

      [Ref 2]: Eun-Mi Hur, Saijilafu, Byoung Dae Lee, Seong-Jin Kim, Wen-Lin Xu, Feng-Quan Zhou. GSK3 controls axon growth via CLASP-mediated regulation of growth cone microtubules. Genes Dev. 2011 Sep 15;25(18):1968-81.

      (5) In Figure 7, the authors a local cytoskeletal action of the drug, but the evidence provided does not differentiate between a localized action of the drugs and a localized cell activity.

      We appreciate the reviewer’s insightful comments and have revised our title to “MLCK/MLCP Regulates mammalian axon regeneration and redistributes growth cone F-actin.” Furthermore, we have made corresponding revisions to the manuscript (line31, line 73).

      References:

      (1) Eun-Mi Hur 1, In Hong Yang, Deok-Ho Kim, Justin Byun, Saijilafu, Wen-Lin Xu, Philip R Nicovich, Raymond Cheong, Andre Levchenko, Nitish Thakor, Feng-Quan Zhou. 2011. Engineering neuronal growth cones to promote axon regeneration over inhibitory molecules. Proc Natl Acad Sci U S A. 2011 Mar 22;108(12):5057-62. doi: 10.1073/pnas.1011258108.

      (2) Garrido-Casado M, Asensio-Juárez G, Talayero VC, Vicente-Manzanares M. 2024. Engines of change: Nonmuscle myosin II in mechanobiology. Curr Opin Cell Biol. 2024 Apr;87:102344. doi: 10.1016/j.ceb.2024.102344.

      (3) Karen A Newell-Litwa 1, Rick Horwitz 2, Marcelo L Lamers. 2015. Non-muscle myosin II in disease: mechanisms and therapeutic opportunities. Dis Model Mech. 2015 Dec;8(12):1495-515. doi: 10.1242/dmm.022103.

      Reviewer #2 (Public review):

      Summary:

      Saijilafu et al. demonstrate that MLCK/MLCP proteins promote axonal regeneration in both the central nervous system (CNS) and peripheral nervous system (PNS) using primary cultures of adult DRG neurons, hippocampal and cortical neurons, as well as in vivo experiments involving sciatic nerve injury, spinal cord injury, and optic nerve crush. The authors show that axon regrowth is possible across different contexts through genetic and pharmacological manipulation of these proteins. Additionally, they propose that MLCK/MLCP may regulate F-actin reorganization in the growth cone, which is significant as it suggests a novel strategy for promoting axonal regeneration.

      Strengths:

      This manuscript presents a wide range of experimental models to address its hypothesis and biological question. Notably, the use of multiple in vivo models significantly enhances the overall validity of the study.

      We thank Reviewer for the positive comment on our manuscript.

      Weaknesses:

      - The authors previously published that blocking myosin II activity stimulates axonal growth and that MLCK activates myosin II. The present work shows that inhibiting MLCK blocks axonal regeneration while blocking MLCP (the protein that dephosphorylates myosin II) produces the opposite effect. Although this contradiction is discussed, no new evidence has been added to the manuscript to clarify this mechanism or address the remaining questions. Critical unresolved questions include: what happens to myosin II expression when both MLCK and MLCP are inhibited? If MLCK/MLCP are acting through an independent mechanism, what would that mechanism be?

      - In the discussion, the authors mention the existence of two myosin II isoforms with opposing effects on axonal growth. Still, there is no evidence in the manuscript to support this point.

      - It is also unclear how MLCK/MLCP acts on the actin cytoskeleton. The authors suggest that proteins such as ADF/cofilin, Arp 2/3, Eps8, Profilin, Myosin II, and Myosin V could regulate changes in F-actin dynamics. However, this study provides no experimental evidence to determine which proteins may be involved in the mechanism.

      Thank you for your comments. Axon growth is an exceptionally intricate process, facilitated by the coordinated regulation of gene expression in the soma, axonal transport along the shaft, and the assembly of cytoskeletal elements and membrane proteins at the growth cone. In this paper, our results primarily demonstrate that MLCK/MLCP plays a crucial role in regulating mammalian axon regeneration and redistributing F-actin within the growth cone; however, we did not investigate which specific proteins act downstream of MLCK/MLCP during axon regeneration.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - A title more suitable for the evidence shown can be: MLCK/MLCP regulates mammalian axon regeneration and redistributes the growth cone F-actin.

      Thank you for your comments. We revised the title of our manuscript to“MLCK/MLCP regulates mammalian axon regeneration and redistributes the growth cone F-actin” (line 3).

      -In figure 3, It would be useful to indicate in the figure legend, that the red arrow is pointing to a suture that was performed during surgery to mark clearly the injury site.

      Thank you for your comments. We revised Figure 3 legend that indicates the red arrow is pointing to a suture that was performed during surgery to mark clearly the injury site (line 571-572).

      - The following is a concern raised in the previous round, and that the response by the authors was so complete and accurate that I consider it would be useful to include it in the discussion section.

      Thank you for your comments. We included those contents in the discussion section of our revised manuscript (line 348-354, line 355-359).

      The author combines MLCK inhibitors with Bleb (Figure 6), trying to verify if both pairs of inhibitors act on the same target/pathway. The rationale is wrong for at least two reasons.

      a- Because both lines of evidence point to contrasting actions of NMII on axon growth, one approach could never "rescue" the other.

      Reply by authors in R1:If MLCK regulates axon growth through the activation of Myosin, the inhibitory effect of ML-7 (an MLCK inhibitor) on axon growth might be influenced by Bleb, a NMII inhibitor. However, our findings reveal that the combination of Bleb and ML-7 does not alter the rate of axon outgrowth compared to ML-7 alone. This suggests that the roles of ML-7 and Bleb in axon growth are independent. It means MLCK may regulate axon growth independent of NMII activity.

      b- Because the approaches target different steps on NMII activation, one could never "prevent" or rescue the other. For example, for Bleb to provide a phenotype, it should find any p-MLC, because it is only that form of MLC that is capable of inhibiting its ATPase site. In light of this, it is not surprising that Bleb is unable to exert any action in a situation where there is no p-MLC (ML-7, which by inhibiting the kinase drives the levels of p-MLC to zero, Figure 4A). Hence, the results are not possible to validate in the current general interpretation of the authors. (See 'major concern').

      Reply by authors in R1: The reported mechanism of blebbistatin is not through competition with the ATP binding site of myosin. Instead, it selectively binds to the ATPase intermediate state associated with ADP and inorganic phosphate, which decelerates the phosphate release. Importantly, blebbistatin does not impede myosin's interaction with actin or the ATP-triggered disassociation of actomyosin. It rather inhibits the myosin head when it forms a product complex with a reduced affinity for actin. This indicates that blebbistatin functions by stabilizing a particular myosin intermediate state that is independent of the phosphorylation status of myosin light chain (MLC).

      [Ref] Kovács M, Tóth J et al. Mechanism of blebbistatin inhibition of myosin II. J Biol Chem. 2004 Aug 20;279(34):35557-63.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Liu et al., present an immersion objective adapter design called RIM-Deep, which can be utilized for enhancing axial resolution and reducing spherical aberrations during inverted confocal microscopy of thick cleared tissue.

      Strengths:

      RI mismatches present a significant challenge to deep tissue imaging, and developing a robust immersion method is valuable in preventing losses in resolution. Liu et al., present data showing that RIM-Deep is suitable for tissue cleared with two different clearing techniques, demonstrating the adaptability and versatility of the approach.

      Greetings, we greatly appreciate your feedback. In truth, we have utilized three distinct clearing techniques, including iDISCO, CUBIC, and MACS, to substantiate the adaptability and multifunctionality of the RIM-Deep adapter.

      Weaknesses:

      Liu et al., claim to have developed a useful technique for deep tissue imaging, but in its current form, the paper does not provide sufficient evidence that their technique performs better than existing ones.

      We are in complete agreement with your recommendation, and the additional experiments will conduct a thorough comparison of the efficacy between the RIM-deep adapter and the official adapter in the context of fluorescence bead experiments, along with their performance in cubic and MASC tissue clearing techniques.

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improvement:

      Major revisions:

      (1) For the bead experiment, the comparison was made to a 10X dry objective instead of an immersion objective, please make a comparison to the standard immersion objective.

      Thank you for your suggestion. We fully agree with your suggestion to make a comparison with the standard immersion objective. We plan to conduct this comparison in future experiments and will thoroughly analyze the imaging differences between the official adapter and the RIM-deep adapter.

      (2) It is unclear if an accurate comparison of objectives (same NA etc) is being made in Fig 1G-J, since the official adapter image appears to be of lower resolution even at the surface. At the very least, progressive 2D slices of the reconstruction must be shown for both adapters instead of just the RIM-Deep adapter.

      Thank you for your suggestion. We strictly controlled the numerical aperture (NA) of the objectives in Fig 1G-J to ensure the accuracy of the comparison. However, the imaging resolution of the official adapter is consistent with that of the RIM-deep adapter. We agree that showing progressive 2D slices of the reconstruction would provide a more comprehensive comparison of the two adapters.

      (3) Similarly, since there already exists an official adapter, it would be useful to see that RIM-Deep performs better even in the mouse tissue, since the clearing method was different.

      Thank you for your suggestion. We will investigate the imaging performance of the two additional tissue clearing protocols using both the official adapter and the RIM-deep adapter.

      (4) The movies need legends, as it is unclear if they even show 2-D slices very deep into the tissue.

      Thank you for your suggestion. We will add figure legends to each movie.

      (5) The purpose of Supplementary Figure 3 in its current form is unclear, as is the statement in the text related to it : "The effectiveness and utility of this adapter configuration have been substantiated through a comprehensive series of experimental validations".

      Thank you for your suggestion. We will revise the statement to: "We validated the effectiveness and utility of this adapter configuration through a series of experiments."

      (6) The system is variably referred to as RIM-Deep or DepthView Enhancer in the text and figures, it would be beneficial to the readers if the authors stuck to one name.

      Thank you for your suggestion. We will choose RIM-Deep as the sole name.

      Minor revisions

      Figures

      (1) “Confocal" is incorrectly spelled as "confocol" in Figure 1, "media" is misspelled in multiple places.

      Thank you. We will correct these errors.

      (2) The camera is misplaced in the Figure 1 A drawing

      Thank you. We will fix this issue.

      (3) It would be useful to have actual pictures of the immersion objective setup (both RIM-Deep and the pre-existing adapter) since the diagrams are not very clear.

      Thank you. We will include actual pictures of both the RIM-Deep and the pre-existing adapter in the supplementary materials.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study by Popli et al. evaluated the function of Atg14, an autophagy protein, in reproductive function using a conditional knockout mouse model. The authors showed that female mice lacking Atg14 were infertile partly due to defective embryo transport function of the oviduct and faulty uterine receptivity and decidualization using PgrCre/+;Atg14f/f mice. The findings from this work are exciting and novel. The authors demonstrated that a loss of Atg14 led to an excessive pyroptosis in the oviductal epithelial cells that compromises cellular integrity and structure, impeding the transport function of the oviduct. In addition, the authors use both genetic and pharmacological approaches to test the hypothesis. Therefore, the findings from this study are high-impact and likely reproducible. However, there are multiple major concerns that need to be addressed to improve the quality of the work.

      Thank you for the additional data that solidified the conclusion of this study. The authors addressed almost all of my previous concerns in this revised manuscript. However, some key points wording still need to be addressed.

      Comments on revisions:

      In Fig. 2A, please ensure that these are 5.0 dpc samples since implantation has already occurred at this point. However, the embryo appeared free-floating adjacent to the luminal epithelial cells (LE), even in control.

      We understand the reviewer’s concern. We have now replaced the previous H & E image with a clearer, higher-quality section that shows a fully attached embryo within a closed uterine lumen representing a typical implantation morphology at the D5 stage of pregnancy. (Revised Figure 2A)

      Fig. 3A-B: "Approximately 80-90% of blastocysts" contradicts the quantification in Figure 3C, which showed a percentage of blastocysts below 50%. Please clarify and correct as needed.

      In Fig. 3A-B, we mean to say approximately 80-90% embryos. We have now corrected the statement in the revised manuscript (Line no: 349-351).  

      The authors showed that Acetylated a-tubulin was present in the ampulla region of cKO (Fig. 4A). However, the revised manuscript still stated that (lines 397-399) ...there was a substantial loss of the ciliary epithelial cells (indicated by fewer a-tubulin and FOXJ1-positive cells) (Fig. 4B, left panel and Fig. S3)... So, the authors may want to tone down their conclusion regarding a "substantial loss" of ciliated epithelial cells if the quantification of ciliated cell number is not performed.

      We thank the reviewer for this suggestion. To avoid redundancy and ambiguity, we have revised the statement as below (Line no: 391-395):

      “As shown in Fig. 4A, normal ciliary structures were observed in the ampulla of both control and cKO oviducts. However, in the isthmus of cKO oviducts, we observed a reduction in both the FOXJ1- and PAX8-expressing cells (Fig. 4B, and Fig. S3).”

      Fig. 4C - the areas with red inset boxes labeled for isthmus are not really isthmus (in both control and cKO). The zoomed-in images (Fig. 4C - The far-right panel for both control and cKO, images are the transitional zone from the ampulla to the isthmus. The isthmus areas should have a thick muscle layer with almost no ciliated cells - see Fig. 4B cKO - those are true isthmus areas.

      We thank the reviewer for noting this. We have corrected the label accordingly. Since ciliary epithelial cells predominantly reside in the ampulla, we have included high-resolution images specifically for the ampulla regions.

      • Fig. 3A and 3C, it appears that the images were taken at different magnifications, but the scale bars are the same at 200 um. The authors, please double-check the scale bars.

      We thank the reviewer for noting this. We have double-checked all the figures to ensure the scale bars are correctly displayed and aligned with the resolution.

      • Fig. 6D - why polyphillin-treated samples did not sum to 100%? - please double-check.

      Since approximately 50% of the embryos were retained in the oviduct following polyphyllin treatment (Figure 6C, upper bar), the bar in Figure 6D represents this percentage (50% retained) rather than 100%.

      Reviewer #2 (Public review)

      In this manuscript, Popli et al investigated the roles of autophagy-related gene, Atg14, in the female reproductive tract (FRT) using conditional knockout mouse models. By ablation of Atg14 in both oviduct and uterus with PR-Cre (Atg14 cKO), authors discovered that such females are completely infertile. They went on to show that Atg14 cKO females have impaired embryo implantation as well as embryo transport from oviduct to uterus. Further analysis showed that Atg14 cKO leads to increased pyroptosis in oviduct, which disrupts oviduct epithelial integrity and leads to obstructive oviduct lumen and impaired embryo transport. The authors concluded that Atg14 is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable proper embryo transport.

      The authors have barely addressed most of my concerns in this revised version with a few minor issues remaining to be addressed:

      (1) The authors tried to address my first concern regarding the statement that "autophagy is critical for maintaining the oviduct homeostasis". The revised statement in Lines 53-54 "we report that Atg14-dependent autophagy plays a crucial role in maintaining..." is still not correct. It should be corrected as " we report that autophagy-related protein Atg14 plays a crucial role in maintaining...".

      We thank the reviewer for this nice suggestion. We have now modified the statement as suggested (Line no: 54).

      (2) Line 349-351 described 80-90% of blastocysts retrieved from oviducts of cKO mice, which is in consistent with Figure 3B (showing more than 98%).

      We thank the reviewer for noting this. We have now corrected the statement as: “Unexpectedly, oviduct flushing from cKO mice resulted in the retrieval of approximately 90% of embryos, suggesting their potential entrapment within the oviducts, impeding their transit to the uterus”. (Line No: 349-351).

      (3) Line 447, "Fig. 5E" should be Fig. 6A. In addition, grammar error in the next sentence.

      We have corrected the figure number and addressed the grammatical error.

      (4) In Figure 6D, why the composition of blastocysts in chemical treated group do not add up to 100%.

      As explained in Reviewer 1 responses, the bar in Figure 6D represents the 50% retained embryos from Figure 6C upper bar the full count.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Pooja Popli and co-authors tested the importance of Atg14 in the female reproductive tract by conditionally deleting Atg14 use PrCre and also Foxj1cre. The authors showed that loss of Atg14 leads to infertility due to the retention of embryos within the oviduct. The authors further concluded that the retention of embryos within the oviduct is due to pyroptosis in oviduct cells leading to defective cellular integrity. The revised manuscript has included new experimental data (Figs. S2B, 5B, 5C, and S3) that satisfied the concerns of this reviewer. The manuscript should provide important advancement to the field.

      We sincerely thank the reviewer for the thoughtful evaluation of our manuscript and appreciate your constructive feedback.

    1. Author response:

      We appreciate the reviewers thoughtful consideration of our manuscript, and their recognition of the variety of experimental and computational approaches we have brought to bear in probing the very challenging question of uncoupled proton leak through EmrE.

      We did record SSME measurements with MeTPP+, a small molecule substrate at two different protein:lipid ratios. These experiments report the rate of net flux when both proton-coupled substrate antiport and substrate-gated proton leak are possible. We will add this data to the revision, including data acquired with different lipid:protein ratio that confirms we are detecting transport rather than binding. In brief, this data shows that the net flux is highly dependent on both proton concentration (pH) and drug-substrate concentration, as predicted by our mechanistic model. This demonstrates that both types of transport contribute to net flux when small molecule substrates are present.

      In the absence of drug-substrate, proton leak is the only possible transport pathway. The pyranine assay directly assesses proton leak under these conditions and unambiguously shows faster proton entry into proteoliposomes through the ∆107-EmrE mutant than through WT EmrE, with the rate of proton entry into ∆107-EmrE proteoliposomes matching the rate of proton entry achieved by the protonophore CCCP. We have revised the text to more clearly emphasize how this directly measures proton leak independently of any other type of transport activity. The SSME experiments with a proton gradient only (no small molecule substrate present) provide additional data on shorter timescales that is consistent with the pyranine data. The consistency of the data across multiple LPRs and comparison of transport to proton leak in the SSME assays  further strengthens the importance of the C-terminal tail in determining the rate of flux.

      None of the current structural models have good resolution (crystallography, EM) or sufficient restraints (NMR) to define the loop and tail conformations sufficiently for comparison with this work. We are in the process of refining an experimental structure of EmrE with better resolution of the loop and tail regions implicated in proton-entry and leak. Direct assessment of structural interactions via mutagenesis is complicated because of the antiparallel homodimer structure of EmrE. Any point mutation necessarily affects both subunits of the dimer, and mutations designed to probe the hydrophobic gate on the more open face of the transporter also have the potential to disrupt closure on the opposite face, particularly in the absence of sufficient resolution in the available structures. Thus, mutagenesis to test specific predicted structural features is deferred until our structure is complete so that we can appropriately interpret the results.

      In our simulation setup, the MD results can be considered representative and meaningful for two reasons. First, the C-terminal tail, not present in the prior structure and thus modeled by us, is only 4 residues long. We will show in the revision and detailed response that the system will lose memory of its previous conformation very quickly, such that velocity initialization alone is enough for a diverse starting point. Second, our simulation is more like simulated annealing, starting from a high free energy state to show that, given such random initialization, the tail conformation we get in the end is consistent with what we reported. It is also difficult to sample back-and-forth tail motion within a realistic MD timescale. Therefore, it can be unconclusive to causally infer the allosteric motions with unbiased MD of the wildtype alone. The best viable way is to look at the equilibrium statistics of the most stable states between WT- and ∆107-EmrE and compare the differences.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This descriptive manuscript builds on prior research showing that the elimination of Origin Recognition Complex (ORC) subunits does not halt DNA replication. The authors use various methods to genetically remove one or two ORC subunits from specific tissues and observe continued replication, though it may be incomplete. The replication appears to be primarily endoreduplication, indicating that ORC-independent replication may promote genome reduplication without mitosis. Despite similar findings in previous studies, the paper provides convincing genetic evidence in mice that liver cells can replicate and undergo endoreduplication even with severely depleted ORC levels. While the mechanism behind this ORC-independent replication remains unclear, the study lays the groundwork for future research to explore how cells compensate for the absence of ORC and to develop functional approaches to investigate this process. The reviewers agree that this valuable paper would be strengthened significantly if the authors could delve a bit deeper into the nature of replication initiation, potentially using an origin mapping experiment. Such an exciting contribution would help explain the nature of the proposed new type of Mcm loading, thereby increasing the impact of this study for the field at large.

      We appreciate the reviewers’ suggestion. Till now we know of only one paper that mapped origins of replication in regenerating mouse liver, and that was published two months back in Cell (PMID: 39293447).  We want to adopt this method, but we do not need it to answer the question asked.  We have mapped origins of replication in ORC-deleted cancer cell lines and compared to wild-type cells in Shibata et al., BioRXiv (PMID: 39554186) (it is under review).  We report the following:  Mapping of origins in cancer cell lines that are wild type or engineered to delete three of the subunits, ORC1, ORC2 or ORC5 shows that specific origins are still used and are mostly at the same sites in the genome as in wild type cells. Of the 30,197 origins in wild type cells (with ORC), only 2,466 (8%) are not used in any of the three ORC deleted cells and 18,319 (60%) are common between the four cell types. Despite the lack of ORC, excess MCM2-7 is still loaded at comparable rates in G1 phase to license reserve origins and is also repeatedly loaded in the same S phase to permit re-replication. 

      Citation: Specific origin selection and excess functional MCM2-7 loading in ORC-deficient cells. Yoshiyuki Shibata, Mihaela Peycheva, Etsuko Shibata, Daniel Malzl, Rushad Pavri, Anindya Dutta. bioRxiv 2024.10.30.621095; doi: https://doi.org/10.1101/2024.10.30.621095 (PMID: 39554186)

      We have now included this in the discussion.

      Public Reviews:

      Reviewer #1 (Public review):

      The origin recognition complex (ORC) is an essential loading factor for the replicative Mcm2-7 helicase complex. Despite ORC's critical role in DNA replication, there have been instances where the loss of specific ORC subunits has still seemingly supported DNA replication in cancer cells, endocycling hepatocytes, and Drosophila polyploid cells. Critically, all tested ORC subunits are essential for development and proliferation in normal cells. This presents a challenge, as conditional knockouts need to be generated, and a skeptic can always claim that there were limiting but sufficient ORC levels for helicase loading and replication in polyploid or transformed cells. That being said, the authors have consistently pushed the system to demonstrate replication in the absence or extreme depletion of ORC subunits.

      Here, the authors generate conditional ORC2 mutants to counter a potential argument with prior conditional ORC1 mutants that Cdc6 may substitute for ORC1 function based on homology. They also generate a double ORC1 and ORC2 mutant, which is still capable of DNA replication in polyploid hepatocytes. While this manuscript provides significantly more support for the ability of select cells to replicate in the absence or near absence of select ORC subunits, it does not shed light on a potential mechanism.

      The strengths of this manuscript are the mouse genetics and the generation of conditional alleles of ORC2 and the rigorous assessment of phenotypes resulting from limiting amounts of specific ORC subunits. It also builds on prior work with ORC1 to rule out Cdc6 complementing the loss of ORC1.

      The weakness is that it is a very hard task to resolve the fundamental question of how much ORC is enough for replication in cancer cells or hepatocytes. Clearly, there is a marked reduction in specific ORC subunits that is sufficient to impact replication during development and in fibroblasts, but the devil's advocate can always claim minimal levels of ORC remaining in these specialized cells.

      The significance of the work is that the authors keep improving their conditional alleles (and combining them), thus making it harder and harder (but not impossible) to invoke limiting but sufficient levels of ORC. This work lays the foundation for future functional screens to identify other factors that may modulate the response to the loss of ORC subunits.

      This work will be of interest to the DNA replication, polyploidy, and genome stability communities.

      Thank you.

      Reviewer #2 (Public review):

      This manuscript proposes that primary hepatocytes can replicate their DNA without the six-subunit ORC. This follows previous studies that examined mice that did not express ORC1 in the liver. In this study, the authors suppressed expression of ORC2 or ORC1 plus ORC2 in the liver.

      Comments:

      (1) I find the conclusion of the authors somewhat hard to accept. Biochemically, ORC without the ORC1 or ORC2 subunits cannot load the MCM helicase on DNA. The question arises whether the deletion in the ORC1 and ORC2 genes by Cre is not very tight, allowing some cells to replicate their DNA and allow the liver to develop, or whether the replication of DNA proceeds via non-canonical mechanisms, such as break-induced replication. The increase in the number of polyploid cells in the mice expressing Cre supports the first mechanism, because it is consistent with few cells retaining the capacity to replicate their DNA, at least for some time during development.

      In our study, we used EYFP as a marker for Cre recombinase activity. ~98% of the hepatocytes in tissue sections and cells in culture express EYFP, suggesting that the majority of hepatocytes successfully expressed the Cre protein to delete the ORC1 or ORC2 genes. To assess deletion efficiency, we employed sensitive genotyping and Western blotting techniques to confirm the deletion of ORC1 and ORC2 in hepatocytes isolated from Alb-Cre mice. Results in Fig. 2C and Fig. 6D demonstrate the near-complete absence of ORC2 and ORC1 proteins, respectively, in these hepatocytes.

      The mutant hepatocytes underwent at least 15–18 divisions during development. The inherited ORC1 or ORC2 protein present during the initial cell divisions, would be diluted to less than 1.5% of wild-type levels within six divisions, making it highly unlikely to support DNA replication, and yet we observe hepatocyte numbers that suggest there was robust cell division even after that point.

      Furthermore, the EdU incorporation data confirm DNA synthesis in the absence of ORC1 and ORC2. Specifically, immunofluorescence showed that both in vitro and in vivo, EYFP-positive hepatocytes (indicating successful ORC1 and ORC2 deletion) incorporated EdU, demonstrating that DNA synthesis can occur without ORC1 and ORC2.

      Finally, the Alb-ORC2f/f mice have 25-37.5% of the number of hepatocyte nuclei compared to WT mice (Table 2).  If that many cells had an undeleted ORC2 gene, that would have shown up in the genotyping PCR and in the Western blots.

      (2) Fig 1H shows that 5 days post infection, there is no visible expression of ORC2 in MEFs with the ORC2 flox allele. However, at 15 days post infection, some ORC2 is visible. The authors suggest that a small number of cells that retained expression of ORC2 were selected over the cells not expressing ORC2. Could a similar scenario also happen in vivo?

      This would not explain the significant incorporation of EdU in hepatocytes that are EYFP positive and do not have detectable ORC by Western blots.  Also note that for MEFs we are delivering the Cre by Adenovirus infection in vitro, so there is a finite probability that a cell will not receive the virus, the Cre and will not delete ORC2.  However, in vivo, the Alb-Cre will be expressed in every cell that turns on albumin.  There is no escaping the expression of Cre.

      (3) Figs 2E-G shows decreased body weight, decreased liver weight and decreased liver to body weight in mice with recombination of the ORC2 flox allele. This means that DNA replication is compromised in the ALB-ORC2f/f mice.

      It is possible that DNA replication is partially compromised or may slow down in the absence of ORC2. However, it is important to emphasize that livers with ORC2 deletion remain capable of DNA replication, so much so that liver function and life span are near normal. Therefore, some kind of DNA replication has to serve as a compensatory mechanism in the absence of ORC2 to maintain liver function and support regeneration.

      (4) Figs 2I-K do not report the number of hepatocytes, but the percent of hepatocytes with different nuclear sizes. I suspect that the number of hepatocytes is lower in the ALB-ORC2f/f mice than in the ORC2f/f mice. Can the authors report the actual numbers?

      We show in Table 2 that the Alb-Orc2f/f mice have about 25-37.5% of the hepatocytes compared to the WT mice.

      (5) Figs 3B-G do not report the number of nuclei, but percentages, which are plotted separately for the ORC2-f/f and ALB-ORC2-f/f mice. Can the authors report the actual numbers?

      In all the FACS experiments in Fig. 3B-G we collect data for a total of 10,000 nuclei (or cells).  For Fig. 3E-G we divide the 10,000 nuclei into the bottom 40% on the EYFP axis (EYFP low, which is mostly EYFP negative) as the control group, and EYFP high (top 20% on the EYFP axis) test group.  We have described this in the Methods in the revision and labeled EYFP negative and positive as EYFP low and high in the Figures and Figure legends.

      (6) Fig 5 shows the response of ORC2f/f and ALB-ORC2f/f mice after partial hepatectomy. The percent of EdU+ nuclei in the ORC2-f/f (aka ALB-CRE-/-) mice in Fig 5H seems low. Based on other publications in the field it should be about 20-30%. Why is it so low here? The very low nuclear density in the ALB-ORC2-f/f mice (Fig 5F) and the large nuclei (Fig 5I) could indicate that cells fire too few origins, proceed through S phase very slowly and fail to divide.

      The percentage of EdU+ nuclei in the ORC2f/f without Alb-Cre mice is 8%, while in PMID 10623657 ~10% of wild type nuclei incorporate  EdU at 42 hr post partial hepatectomy (mid-point between the 36-48 hr post hepatectomy that was used in our study).  The important result here is that in the ORC2f/f mice with Alb-Cre (+/-) we are seeing significant EdU incorporation. We have also corrected the X-axis labels in 5F, 5I, 7E and 7F to reflect that those measurements were not made at 36 hr post-resection but later (as was indicated in the schematic in Fig. 5A).

      (7) Fig 6F shows that ALB-ORC1f/f-ORC2f/f mice have very severe phenotypes in terms of body weight and liver weight (about on third of wild-type!!). Fig 6H and 6I, the actual numbers should be presented, not percentages. The fact that there are EYFP negative cells, implies that CRE was not expressed in all hepatocytes.

      The liver weight is very dependent on the body weight, and so we have to look at the liver to body weight ratio to determine if it is inordinately small, and the ratio is 70% of the WT.  In females the liver and body weight are low (although in proportion to each other), which maybe is what the reviewer is talking about.  However, the fact that liver weight and body weight are not as low in males, suggest that this is a gender (hormone?) specific effect and not a DNA replication defect.  We had discussed this possibility.  We have another paper also in BioRXiv (Su et al. doi.org/10.1101/2024.12.18.629220) that suggests that ORC subunits have significant effect on gene expression, so it is possible that that is what leads to this sexual dimorphism in phenotype.  We have now added this to the discussion.

      The bottom 40% of nuclei on the EYFP axis in the FACS profiles (what was labeled EYFP negative but will now be called EYFP low) contains mostly non-hepatocytes that are genuinely EYFP negative.   Non-hepatocytes (bile duct cells, endothelial cells, Kupffer cells, blood cells) are a significant part of cells in the dissociated liver (as can be seen in the single cell sequencing results in PMID: 32690901).  Their presence does not mean that hepatocytes are not expressing Cre.  Hepatocytes are nearly 100% EYFP positive, as can be seen in the tissue sections (where the hepatocytes take up most of visual field) and in cells in culture.  Also if there are EYFP negative hepatocyte nuclei in the FACS, that still does not rule out EYFP presence in the cytoplasm.  The important point from the FACS is that the EYFP high nuclei (which have expressed Cre for the longest period) are polyploid relative to the EYFP low nuclei.

      (8) Comparing the EdU+ cells in Fig 7G versus 5G shows very different number of EdU+ cells in the control animals. This means that one of these images is not representative. The higher fraction of EdU+ cells in the double-knockout could mean that the hepatocytes in the double-knockout take longer to complete DNA replication than the control hepatocytes. The control hepatocytes may have already completed DNA replication, which can explain why the fraction of EdU+ cells is so low in the controls. The authors may need to study mice at earlier time points after partial hepatectomy, i.e. sacrifice the mice at 30-32 hours, instead of 40-52 hours.

      The apparent difference that the reviewer comments on stems from differences in nuclear density in the images in Fig. 7G and 5G (also quantitated in Fig. 7F and 5F).  The quantitation in Fig. 7H and 5H show that the % of EdU plus cells are comparable (5-8%). 

      (9) Regarding the calculation of the number of cell divisions during development: the authors assume that all the hepatocytes in the adult liver are derived from hepatoblasts that express Alb. Is it possible to exclude the possibility that pre-hepatoblast cells that do not express Alb give rise to hepatocytes? For example the cells that give rise to hepatoblasts may proliferate more times than normal giving rise to a higher number of hepatoblasts than in wild-type mice.

      Single cell sequencing of mouse liver at e11 shows hepatoblasts expressing hepatocyte specific markers (PMID: 32690901).  All the cells annotated from the single-cell seq analysis are differentiated cells arguing against the possibility that undifferentiated endodermal cells (what the reviewer probably means by pre-hepatoblasts) exist at e11.  We have added this citation to the paper.

      Here is a review that says the hepatoblasts expressing Albumin are present before e13.  (https://www.ncbi.nlm.nih.gov/books/NBK27068/) says: “The differentiation of bi-potential hepatoblasts into hepatocytes or BECs begins around e13 of mouse development. Initially hepatoblasts express genes associated with both adult hepatocytes (Hnf4α, Albumin) ...”  Thus, we can be certain that hepatoblasts before e13 express albumin.  Our calculation of number of cell divisions in Table 2 begins from e12.

      The reviewer may be suggesting that ORC deletion leads to the immediate demise of hepatoblasts (despite having inherited ORC protein from the endodermal cells) causing undifferentiated endodermal cells to persist and proliferate much longer than in normal development.  We consider it unlikely, but if true it will be very unexpected, both by suggesting that deletion of ORC immediately leads to the death of the hepatoblasts (despite a healthy reserve of inherited ORC protein) and by suggesting that there is a novel feedback mechanism from the death/depletion of hepatoblasts leading to the persistence and proliferation of undifferentiated endodermal cells. We have added the reviewer’s suggestion to the discussion.

      (10) My interpretation of the data is that not all hepatocytes have the ORC1 and ORC2 genes deleted (eg EYFP-negative cells) and that these cells allow some proliferation in the livers of these mice.

      Please see the reply in question #1.  Particularly relevant: “Finally, the Alb-ORC2f/f mice have 25-37.5% of the number of hepatocyte nuclei compared to WT mice (Table 2).  If that many cells had an undeleted ORC2 gene, that would have shown up in the genotyping PCR and in the Western blots.

      Reviewer #3 (Public review):

      Summary:

      The authors address the role of ORC in DNA replication and that this protein complex is not essential for DNA replication in hepatocytes. They provide evidence that ORC subunit levels are substantially reduced in cells that have been induced to delete multiple exons of the corresponding ORC gene(s) in hepatocytes. They evaluate replication both in purified isolated hepatocytes and in mice after hepatectomy. In both cases, there is clear evidence that DNA replication does not decrease at a level that corresponds with the decrease in detectable ORC subunit and that endoreduplication is the primary type of replication observed. It remains possible that small amounts of residual ORC are responsible for the replication observed, although the authors provide arguments against this possibility. The mechanisms responsible for DNA replication in the absence of ORC are not examined.

      Strengths:

      The authors clearly show that there are dramatic reductions in the amount of the targeted ORC subunits in the cells that have been targeted for deletion. They also provide clear evidence that there is replication in a subset of these cells and that it is likely due to endoreduplication. Although there is no replication in MEFs derived from cells with the deletion, there is clearly DNA replication occurring in hepatocytes (both isolated in culture and in the context of the liver). Interestingly, the cells undergoing replication exhibit enlarged cell sizes and elevated ploidy indicating endoreduplication of the genome. These findings raise the interesting possibility that endoreduplication does not require ORC while normal replication does.

      Weaknesses:

      There are two significant weaknesses in this manuscript. The first is that although there is clearly robust reduction of the targeted ORC subunit, the authors cannot confirm that it is deleted in all cells. For example, the analysis in Fig. 4B would suggest that a substantial number of cells have not lost the targeted region of ORC2. Although the western blots show stronger effects, this type of analysis is notorious for non-linear response curves and no standards are provided. The second weakness is that there is no evaluation of the molecular nature of the replication observed. Are there changes in the amount of location of Mcm2-7 loading that is usually mediated by ORC? Does an associated change in Mcm2-7 loading lead to the endoreduplication observed? After numerous papers from this lab and others claiming that ORC is not required for eukaryotic DNA replication in a subset of cells, we still have no information about an alternative pathway that could explain this observation.

      We do not see a significant deficit in MCM2-7 loading (amount and rate) in cancer cell lines where we have deleted ORC1, ORC2 or ORC5 genes separately in Shibata et al. bioRxiv 2024.10.30.621095; doi: https://doi.org/10.1101/2024.10.30.621095 (PMID: 39554186).  This is now cited in the discussion.

      The authors frequently use the presence of a Cre-dependent eYFP expression as evidence that the ORC1 or ORC2 genes have been deleted. Although likely the best visual marker for this, it is not demonstrated that the presence of eYFP ensures that ORC2 has been targeted by Cre. For example, based on the data in Fig. 4B, there seems to be a substantial percentage of ORC2 genes that have not been targeted while the authors report that 100% of the cells express eYFP.

      (1) The PCR reactions in Fig. 4B are still contaminated by DNA from non-hepatocyte cells:  bile duct cells, endothelial, Kupfer cells and blood cells.  Microscopy of  cultured cells idnetifies the hepatocytes unequivocally from their morphology. <2% of the hepatocyte cells in culture in Fig. 4C are EYFP-.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors should present the data as suggested in the review and reformulate their conclusions. If possible, mice should be examined 30-32 hours after partial hepatectomy.

      Based on the Literature we chose a time that is consistent with the previous paper from us (Uchida et al., Genes & Dev).

      Reviewer #3 (Recommendations for the authors):

      (1) It would improve the paper to use single-cell methods (e.g. FISH) to assess the deletion of ORC subunits in the targeted cells.

      This is something we will reserve for future studies.

      (2) The importance of the paper would be increased dramatically by showing that the elimination of ORC changed the location of Mcm2-7 loading. This would be highly likely if the authors hypothesis that ORC is not involved is true. On the other hand, given ORC's role in origin selection, an observation that the same sites are used but less frequently would support a hypothesis that residual intact ORC is responsible for the replication observed.

      Shibata et al (PMID: 39554186) has answered this question.  The loss of ORC does not change the locations of origins or even the ability to specify origins.  We argue that this is what is to be expected from our hypothesis, that although ORC is clearly important for MCM loading in yeast and in biochemical experiments, something unexpected is going on in human cells.  Either a vanishingly small amount of ORC (undetectable by commonly used methods) can load the full complement of MCM2-7 at a rate that is comparable to wild type cells, or there is an ORC-independent mechanism of MCM2-7 loading.   This is now added to the discussion.